Back

aligner: Fully automatic subtitle correction

#1
I'm happy to announce aligner, a program that automatically corrects subtitles by providing a "reference subtitle" (possibly in another language). It figures out the best offsets and where to set/remove advertisement breaks.

You can use it like that:

Code:
aligner reference_subtitle.ssa incorrect_subtitle.srt output.srt

Of course you can call it in a bash-for-loop, allowing you to fix a whole series in a matter of seconds.

This resulting file can be directly played with your favorite video player or used with subs2srs - making card creation a breeze!


EDIT: A 64-bit Windows binary can be found here.
Edited: 2017-02-18, 2:42 pm
Reply
#2
Wow! I can't wait to try this out.
Reply
#3
This sounds great! I've been having all sorts of headaches manually adjusting subtitles recently.
Reply
Thanksgiving Sale: 30% OFF Basic, Premium & Premium PLUS Subscriptions! (Nov 13 - 22)
JapanesePod101
#4
Looks amazing!

I'm not familiar with Rust yet, how does this work on windows machines? Does it compile to an exe, or is it interpreted like python?
Reply
#5
Rust compiles to native machine code, which is needed for speed (this program is computationally expensive). I developed it on Linux but the source code should be compile-able on Windows and MacOS too (but each requires an own binary).

If there is someone who manages to produce a Windows binary, it would be very nice to post it here on this forum so other users can start right away.
Reply
#6
Okay, I researched a bit and found that cross compilation (compiling a Windows binary on Linux) is quite easy with Rust: you can download a Windows 64-bit binary from here.
Reply
#7
I just found out that ffmpeg provides a scene/cut detection. If we convert the output of

Code:
ffmpeg -i $VIDEO_FILE  -filter:v "select='gt(scene,0.4)',showinfo"  -f null  -

to a .srt file (the dialog lines can be arbitrary), aligner can also align a subtitle to a video. This is untested but probably yields acceptable results. It would be really interesting if somebody tries that Smile
Edited: 2017-02-18, 4:06 am
Reply
#8
Oh my god, I haven't tried this out yet but this seems like it could be amazing. If someone could provide a windows binary that would be great. There are a lot of anime series with untimed subs on kitsunekko but retiming them is a real pain in the ass so this would help out tons.
Reply
#9
I already compiled the windows binary. I edited the first post so you can find the link without searching.

And yes: the main motivation are kitsunekko subtitles (*sigh* they are almost never correct).
Reply
#10
(2017-02-18, 2:33 pm)Xavier22 Wrote: Oh my god, I haven't tried this out yet but this seems like it could be amazing.  If someone could provide a windows binary that would be great.  There are a lot of anime series with untimed subs on kitsunekko but retiming them is a real pain in the ass so this would help out tons.

The Windows binary is linked a couple comments above this one.

This tool is really, really amazing. I started years ago trying to align a bunch of Japanese subs with my English-subbed One Piece DVDs, but gave up after a few episodes because it was so fiddly. Now I can generate them easily.

A couple of minor snags I've run into (for the user, not for kaegi):

If you're trying to align subs that don't quite match up (for instance, my Japanese subs have subs for the intro voiceover but few or no subs for the opening credits, while the English has extensive subs for the opening credits), you may end up with the app reporting "negative" subtitles at the beginning. I found the easiest way to fix this was just to lop the opening subtitles off both, to make sure they were starting at the same line. You can open "output" in Subtitle Edit or something similar to lop off the extra stuff after aligning, even.

Because the English sometimes seems to have been appearing on screen slightly after characters begin speaking, when outputting to srs, I had a lot of trouble with audio being cut off at the beginning. The best fix for this is to build in lots of extra padding time in sub2srs - before and after the lines.
Reply
#11
Thanks, somehow I missed the link. I know how to extract subs from mkvs, so I don't have a problem with that as for getting the timed English subs. But how exactly would I use this program on Windows with the binary, individually and/or in batch?
Reply
#12
Quote:If you're trying to align subs that don't quite match up (for instance, my Japanese subs have subs for the intro voiceover but few or no subs for the opening credits, while the English has extensive subs for the opening credits), you may end up with the app reporting "negative" subtitles at the beginning. I found the easiest way to fix this was just to lop the opening subtitles off both, to make sure they were starting at the same line. You can open "output" in Subtitle Edit or something similar to lop off the extra stuff after aligning, even.
I'm not quite sure about your requirements, but you don't have to do that (in most cases). The problem with negative timestamps is that the subtitle formats don't specify how it is written into the file.

So minus 4 minutes and 5 seconds can be represented as "-04:05.000", "-04:-05.000" or even "-05:55.000" (meaning -5 minutes plus 55 seconds). Your video player will have multiple ways of intepreting these timestamps, or might even reject it. The program handles this case by moving the negative timespans so they start with the zeroth second. So all "negative" lines will be played in the first few seconds, which should be acceptable in most cases.
What is your use case for truncating the file? Or is the warning message ambiguous?

(2017-02-18, 3:07 pm)Xavier22 Wrote: Thanks, somehow I missed the link.  I know how to extract subs from mkvs, so I don't have a problem with that as for getting the timed English subs.  But how exactly would I use this program on Windows with the binary, individually and/or in batch?

Via the cmd.exe or powershell (rise your command-line-skills to the next level!). I don't know the exact syntax for loops in cmd.exe scripts, somebody else might help you more with that.
Edited: 2017-02-18, 3:16 pm
Reply
#13
Figured it out, thanks!  Haven't figured out batch yet but not super important at the moment.  To try out the program I used Aikatsu episode 1 (Coalgirls) as a reference, with the SRT on kitsunekko.  It worked great for the first minute and a half but got screwed up at the OP.  In the Coalgirls subs there are subs for the 1:30 minute OP, but in the kitsunekko subs there is a 3 minute gap (likely for the OP+commercials).  What ended up happening is that the subs ended up being pulled back to match the OP lyrics and the rest of the episode was out of sync.  Is there any way to fix this, maybe playing around with the split-penalty parameter?
Reply
#14
(2017-02-18, 3:39 pm)Xavier22 Wrote: Figured it out, thanks!  Haven't figured out batch yet but not super important at the moment.  To try out the program I used Aikatsu episode 1 (Coalgirls) as a reference, with the SRT on kitsunekko.  It worked great for the first minute and a half but got screwed up at the OP.  In the Coalgirls subs there are subs for the 1:30 minute OP, but in the kitsunekko subs there is a 3 minute gap (likely for the OP+commercials).  What ended up happening is that the subs ended up being pulled back to match the OP lyrics and the rest of the episode was out of sync.  Is there any way to fix this, maybe playing around with the split-penalty parameter?

Yep, I think a value between 0.8 and 2 will probably work for you.

The default value is 4, but that is not very battle-tested. If you often have to take higher or lower values feel free to share them.
Edited: 2017-02-18, 3:49 pm
Reply
#15
Re: the use case, I didn't want the opening narration subs all piled up on top of each other in the beginning, which is what seemed to be happening when they were zeroed out. Although, if they would just disappear after a second, maybe I wouldn't notice them. I was only trying out the preview stage, and deleting the opening lines was the easiest solution anyway.

Xavier's problem with the gap for the OP is akin to mine, though it sounds like the results he got were different. I'll have to try out the split-penalty parameter, I haven't tried that yet.

Anyway, thanks for an awesome tool!
Reply
#16
(2017-02-18, 3:45 pm)kaegi Wrote: Yep, I think a value between 0.8 and 2 will probably work for you.

The default value is 4, but that is not very battle-tested. If you often have to take higher or lower values feel free to share them.

I tried out a bunch of different stuff, and nothing worked.  So then I tried going into the reference subtitle file and deleting the OP subs to see if that fixed it - and they weren't there!  Then I realized that Coalgirls release was using split .mkvs where the OP and ED are in separate files that are automatically loaded in to save on filesize.  So the program was doing its job, it just didn't handle that weird abnormality right.  I'm going to go test it with a normal release and see how it works.
Reply
#17
You can now upload edge cases/non-working subtitles here (after making sure the file is valid and trying high and low split-penalty values). That way I will get a better overview over real-world problems.

Please name these files like this, so I can easily find corresponding ones:

SameFileName_inc.srt|ass|idx
SameFileName_ref.srt|ass|idx
Edited: 2017-02-18, 5:25 pm
Reply
#18
Just tested it with Kitsunekko's Love Lab ep 1 and Commie's TV release of ep 1 that I previously had to retime myself.  Works like a charm, brilliant stuff, thanks so much OP.  This will make a big difference for a lot of people to help get their listening skills up to par with their reading and enjoy anime/dramas/whatever.  If someone could explain how to do this in batch in cmd then it would be perfect.
Reply
#19
I had several issues with subs2srs and subtitles timings in the past. This sounds super useful.
Reply
#20
Hi kaegi, thank you for working on this tool. I've tested out this program and I had one feature request: it would be nice if it automatically ignored blank lines in .srt files. By blank lines I mean it has a subtitle entry number, timing information, but then no text where the subtitle text should be. Right now I have to open up the srt file in aegisub and then delete blank lines before running it through aligner. It'd be great if the program could automatically do that step. I know blank lines aren't supposed to exist in the srt file to begin with, but at least for the JPsubber files I tested it on, there were at least a few blank lines in each episode.
Reply
#21
(2017-02-19, 4:15 am)harahachibu Wrote: Hi kaegi, thank you for working on this tool.  I've tested out this program and I had one feature request: it would be nice if it automatically ignored blank lines in .srt files.  By blank lines I mean it has a subtitle entry number, timing information, but then no text where the subtitle text should be.  Right now I have to open up the srt file in aegisub and then delete blank lines before running it through aligner.  It'd be great if the program could automatically do that step.  I know blank lines aren't supposed to exist in the srt file to begin with, but at least for the JPsubber files I tested it on, there were at least a few blank lines in each episode.

Aaaaaand done! This was more in the bug realm than the feature realm and needed a single character change. The link now provides the latest binary.
Reply
#22
(2017-02-19, 4:30 am)kaegi Wrote:
(2017-02-19, 4:15 am)harahachibu Wrote: Hi kaegi, thank you for working on this tool.  I've tested out this program and I had one feature request: it would be nice if it automatically ignored blank lines in .srt files.  By blank lines I mean it has a subtitle entry number, timing information, but then no text where the subtitle text should be.  Right now I have to open up the srt file in aegisub and then delete blank lines before running it through aligner.  It'd be great if the program could automatically do that step.  I know blank lines aren't supposed to exist in the srt file to begin with, but at least for the JPsubber files I tested it on, there were at least a few blank lines in each episode.

Aaaaaand done! This was more in the bug realm than the feature realm and needed a single character change. The link now provides the latest binary.

Thanks kaegi, very fast turnaround!

I have run into an issue when the program tries to parse a file. I believe the srt file is correct, but for some reason it crashes on the very last subtitle on the reference file, with an "expected end of input" error. I have uploaded the offending files to you with the filenames "hanasakimai2_01_ref.srt" and "hanasakimai2_01_inc.srt"

I can upload more files from this series that seem to be running into the same issue if you would like.

The error message is below:
Quote:aligner hanasakimai2_01_ref.srt hanasakimai2_01_inc.srt hanasaki01-fixed.srt
EE: error: operation on file 'hanasakimai2_01_ref.srt' failed
EE: caused by: Parse error at line: 2966, column: 1; Unexpected `7`; Expected `end of input`
EE: note: run program with `env RUST_BACKTRACE=1` for a backtrace
Reply
#23
(2017-02-19, 11:33 am)harahachibu Wrote: The error message is below:
Quote:aligner hanasakimai2_01_ref.srt hanasakimai2_01_inc.srt hanasaki01-fixed.srt
EE: error: operation on file 'hanasakimai2_01_ref.srt' failed
EE: caused by: Parse error at line: 2966, column: 1; Unexpected `7`; Expected `end of input`
EE: note: run program with `env RUST_BACKTRACE=1` for a backtrace

Okay the problem is that every line is supposed to end with a newline character (which the last line of your reference subtitle does not have). So the last line doesn't count as "line" and therefore the "end of file" is expected. You can just add the empty line at the end as a temporary workaround.

By the way: I've never seen such nice offsets like in your subtitles (exactly 30 seconds, 2:30, 4:30, ...). Even I am sometimes surprised how well the algorithm works. Math is beautiful Smile
Reply
#24
(2017-02-19, 2:47 pm)kaegi Wrote:
(2017-02-19, 11:33 am)harahachibu Wrote: The error message is below:
Quote:aligner hanasakimai2_01_ref.srt hanasakimai2_01_inc.srt hanasaki01-fixed.srt
EE: error: operation on file 'hanasakimai2_01_ref.srt' failed
EE: caused by: Parse error at line: 2966, column: 1; Unexpected `7`; Expected `end of input`
EE: note: run program with `env RUST_BACKTRACE=1` for a backtrace

Okay the problem is that every line is supposed to end with a newline character (which the last line of your reference subtitle does not have). So the last line doesn't count as "line" and therefore the "end of file" is expected. You can just add the empty line at the end as a temporary workaround.

By the way: I've never seen such nice offsets like in your subtitles (exactly 30 seconds, 2:30, 4:30, ...). Even I am sometimes surprised how well the algorithm works. Math is beautiful Smile

Thank you, that fixed the issue! Thanks again for such a great tool. Using Japanese subtitles is a great learning tool for me, and now I can piggyback on the timing work that people put in for the English subtitles to automatically re-time the Japanese subtitle dumps that are out there.
Reply
#25
aligner now also handles .srt files that don't have that extra empty line correctly. The fresh binary can be found in the link from the first post.
Reply