Back

Adaptive Subtitles (Japanese/English Hybrid Subtitles)

#1
If I watch a movie with English subtitles there are lots of lines that I could easily understand without English subtitles. But if I watch that same movie with Japanese subtitles, there are lots of lines that I won't understand without the English subs. So an idea that I had to get around this would be some sort of adaptive subtitle software.

Adaptive subtitles would would get information about known phrases from your anki decks, and then search your subtitle files for that phrase and then copy/paste it into the same location in the English subtitle file. What you would end up with is the automated creation of hybrid subtitle files containing English subtitles for what you don't know, and Japanese subtitles for what you do know. This would allow you to reinforce what you know without sacrificing comprehension of the story while watching new movies.

Thoughts?
Reply
#2
I think what you ask for is too niche to get someone actually to do it.

I am not sure, but I think that in KMPlayer you can watch 2 subtitle streams at the same time.
Reply
#3
i can imagine it being implemented but it would take a LOT of programming work and i just don't think the increased benefit would be really worth the time it takes to make those bulletproof substitution algorithms and supporting anki decks.
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
It would have some enormous implementation hurdles - there's no particular guarantee that an English and a Japanese subtitle file use identical timing, in fact, they are unlikely to do so as timings are generally applied to an untimed script line by line by hand.

Even if you have identical timings or an algorithm to resolve timing (say by caculating which titles overlap by > 50% of their display time), long passages may still be broken up into multiple titles for one sentence.

I'm not sure how to even start resolving that problem, but ignoring it will mean that -some- parts of the same sentence will be displayed in each language, and because of the different grammar, the result will probably be incoherent and duplicate some ideas while dropping others.

I suspect the only way to totally resolve all the issues would be to deliberately create dual-language subtitle files in a way that was custom to the adaptive program.

I suppose you could manage with an algorithm that displayed -both- languages when a chain of titles is continuously displayed, and that chain contains both 'known' and 'unknown' phrases.

Definitely a niche product. I think most learners simply go with Japanese subtitles or none at all, and start learning the missing vocabulary until they can understand the script in it's entirety.
Reply
#5
foodcubes Wrote:What you would end up with is the automated creation of hybrid subtitle files containing English subtitles for what you don't know, and Japanese subtitles for what you do know. This would allow you to reinforce what you know without sacrificing comprehension of the story while watching new movies.

Thoughts?
This sounds really interesting. You could do it on a line by line basis by checking if a line is i+0 mature/known (using mature.db/known.db via morph man) and selecting the Japanese or English line. I haven't done any code that does subtitles processing, but the selection of which line to use would be pretty easy.

Getting 1:1 Japanese:English subs with matching timings is a bit of a pain, but is relatively easy with Aegisubs (just takes time).
Edited: 2011-09-20, 1:12 am
Reply
#6
I had the same idea! but then forgot about it... it would be cool if it could be made.

Even filtering by jlpt level would be pretty good or some other way of choosing a difficulty.
Edited: 2011-09-20, 2:39 am
Reply
#7
I did something similar with pure japanese subs a while ago where I exported my anki deck and if N=0, don't display the line, if 0<N<3, create a subtitle with just the unknown words, reading, and definition, or else display the original line. I'll see if a I can find the code somewhere... I've found that the timing of english subs from offical sources (Funimation, Viz, etc) usually match the dialogue time exactly.
Edited: 2011-09-20, 9:54 am
Reply
#8
overture2112 Wrote:Getting 1:1 Japanese:English subs with matching timings is a bit of a pain, but is relatively easy with Aegisubs (just takes time).
Oh, I'm stupid. You can just use subs2srs to generate dueling subs in .ass format then a tiny bit of code to remove either the Japanese or English line depending on i+N. That should solve a good chunk of the slighty-off timing issues; the only problem then is you have to do a quick run through in Aegisubs to combine lines when the English translation's sentence is split.
Edited: 2011-09-20, 12:25 pm
Reply
#9
Sebastian Wrote:I think what you ask for is too niche to get someone actually to do it.
Never fear, awesome ideas deserve to be realized. I've just added this feature to Morph Man 2.05.

I haven't gotten to experiment a ton with it yet and the implementation is a bit hackish, but it worked for the various K-On! dueling subs I made. Hopefully people can play around with it and suggest improvements or give example files it fails to work on.
Reply
#10
Wow, a lot of good ideas here!

overture2112, it seems like you've already developed the solution. I haven't used Morph Man before but if I'm understanding it correctly your morphemes algorithm can predict the likelihood that you will know a card. And so instead of predicting which cards you know, you will predict which lines you know. I'm a little pressed for time right now, and it looks like there's a little learning curve for Morph Man, but I will do my best to test that as soon as I can.
Reply
#11
You've already gotten it to work, so we know it works. It took me a week to learn to make subs2srs decks, so of course I'm having some trouble here. I think I'm missing a known db or not configuring something right. Here's what I've done:

1. Create a dueling subtitle file in subs2srs
2. Download Morph Man 2
3. Morph Man -> Adaptive Subs -> Convert Subs -> (Select dueling subs file) -> Select new file location -> Convert Subs...


An error occurred in a plugin. Please contact the plugin author.
Please do not file a bug report with Anki.

Traceback (most recent call last):
File "C:\Documents and Settings\Administrator\Application Data\.anki\plugins\morph\manager.py", line 65, in onGo
adaptiveSubs.run( inFile, outFile, ws, bs, mFmt, kFmt, uFmt )
File "C:\Documents and Settings\Administrator\Application Data\.anki\plugins\morph\adaptiveSubs.py", line 26, in run
kdb = M.MorphDb( knownDbPath )
File "C:\Documents and Settings\Administrator\Application Data\.anki\plugins\morph\morphemes.py", line 147, in __init__
self.load( path )
File "C:\Documents and Settings\Administrator\Application Data\.anki\plugins\morph\morphemes.py", line 179, in load
f = gzip.open( path, 'rb' )
File "C:\cygwin\home\dae\Home\anki\win\build\pyi.win32\anki\outPYZ1.pyz/gzip", line 34, in open
File "C:\cygwin\home\dae\Home\anki\win\build\pyi.win32\anki\outPYZ1.pyz/gzip", line 89, in __init__
IOError: [Errno 2] No such file or directory: 'C:\\Documents and Settings\\Administrator\\Application Data\\.anki\\plugins\\morph\\dbs\\known.db'
Reply
#12
foodcubes Wrote:IOError: [Errno 2] No such file or directory: 'C:\\Documents and Settings\\Administrator\\Application Data\\.anki\\plugins\\morph\\dbs\\known.db'
Yep, it looks like it didn't finish creating your known.db.

You'll need to go into the configuration for a deck (or all of them) and enable it so Morph Man will analyze that deck to flesh out your known.db/mature.db (it will also set various fields on them if they're available and modify the creation times to make new cards appear in a useful order). Once it's done you should be able to run the adaptive subs feature.
Reply
#13
Tried: Morph Man -> Control Auto -> Enabled? "yes" -> Save cfg -> But there is still no known.db file.

Tried: Morph Man -> Save results to db -> resulted in an error


Maybe there was an error during installation if a known.db was supposed to automatically created?
Reply
#14
foodcubes Wrote:Tried: Morph Man -> Control Auto -> Enabled? "yes" -> Save cfg -> But there is still no known.db file.
After doing this you need to keep Anki open and make sure the deck isn't open by anki. In the config page it should say the last time MorphMan updated the deck as well as the last time it did a full update of everything. Alternatively, you can look at {plugins dir}/morph/tests/auto.log to check progress or see if there was some problem.
Reply
#15
Alright, Morph Man adaptive subs exported a file with subs like the following:

これは俺たちだけの秘密だ ![This will be a secret just between the two of us.]
He'd never look at me that way. [ねー 得る ないし 相手] 4


* It looks like it changes the order if I know or don't know the sentence (Japanese vs English)
* Second language is still present, but within brackets
* Sometimes a number is present at the end of the sentence, typically 1-4.
Reply
#16
foodcubes Wrote:Alright, Morph Man adaptive subs exported a file with subs like the following:

これは俺たちだけの秘密だ ![This will be a secret just between the two of us.]
He'd never look at me that way. [ねー 得る ないし 相手] 4


* It looks like it changes the order if I know or don't know the sentence (Japanese vs English)
* Second language is still present, but within brackets
* Sometimes a number is present at the end of the sentence, typically 1-4.
Yep, the default format is the "debugging" template iirc, which is described here:

http://rtkwiki.koohii.com/wiki/Morph_Man...ats_to_try

The "minimalist" format is probably what you ideally want, but I set the default to the debugging one since you'll probably have to check for these problems before using it:

* subs aren't aligned properly
* english subs don't match the japanese sub closely enough, causing confusion as to why the sentence seems trivial but isn't considered known
* I find the most common culprit is the english subs drop names for pronouns and the names aren't known
* your known.db is missing a lot of words you do, in fact, know.

And the debugging template makes it far easier to check for those problems.
Reply
#17
Awesome Job overture2112! Other than the short learning curve, it's quick and easy to run a file through morph man adaptive subs. People will no longer have to sacrifice language learning or story line when watching new movies. Or, I should say that I will no longer have to read things like "what's this?" because I can now read "これ何ですか?", even if the rest of the language is way to complex to follow with Japanese subs.


Example output:

Why do we have to look at Mr. Hirata's face even while we're eating
We can't think up good ideas like that.
そうですよね

How about being a little more involved with work?
何もするなって言ったのお兄さまですよ
遊んでいいとは誰も言ってないぞ
はいはい



One note for now: The Morph man enabled deck should include the easy little words and phrases that you might otherwise delete. Rather than delete, just mark as a easy a few times and it won't come up on review. This way morph man will know that you know the word/phrase.
Reply