Back

Use subs2srs to Create Anki Decks Based on Your Favorite Movie or Show

#51
Though this is a great tool, I suspect swapping files will be limited. Since what you are doing is effectively memorizing every line in a movie, you're only going to do it with real favourites, since you have to listen to it over and over again. Plus think: 1,500+ cards for this movie alone, so five movies at that rate is 7,500 cards give or take, minus small and duplicate files. That's a huge number of cards. So assuming everyone wants cards for a handful of movies, the amount of overlap between different people's preferences will be minimal, and therefore the amount of sharing limited. It's more of something we'd each do for ourselves, while offering any decks we had done, just in case it was someone else's favourite. Just a thought.
Reply
#52
timcampbell Wrote:Though this is a great tool, I suspect swapping files will be limited. Since what you are doing is effectively memorizing every line in a movie, you're only going to do it with real favourites, since you have to listen to it over and over again. Plus think: 1,500+ cards for this movie alone, so five movies at that rate is 7,500 cards give or take, minus small and duplicate files. That's a huge number of cards. So assuming everyone wants cards for a handful of movies, the amount of overlap between different people's preferences will be minimal, and therefore the amount of sharing limited. It's more of something we'd each do for ourselves, while offering any decks we had done, just in case it was someone else's favourite. Just a thought.
You don't have to use every sentence from each deck. I'll surely be suspending quite a few cards, or perhaps just searching for kanji compounds, or just using what lines I think are cool, or a few other things. I think it'd be cool though to collate a bunch of decks from a specific genre or medium (scifi, or dramas, for example), and then establish a deck made from a frequency list of however many cards (either kanji or vocabulary or both) based on that (and then apply the same logic of keeping/discarding whatever one wants).
Edited: 2009-02-02, 11:33 pm
Reply
#53
Indeed, I think that it's better to be selective about what you want to input into your deck. I don't think there's much value in adding 1500 cards for every film. It would be great to have a facility that lets you review the cards that are created and allows you to quickly check the ones you want to import into your deck. Although importing and deleting ones you don't like is a good method too.

Beginners might want to choose a film they love and shove the entire thing into Anki and slowly work their way through it. But only if they enjoy it. I think intermediate learners would be better served by choosing a film they love and creating a selection of cards that they want to learn. Intermediate learners should be able to read the cards quickly and make a swift judgment about whether it's too difficult, too easy or too obscure to stick in your Anki deck. That's what I plan to do anyway.
Reply
JapanesePod101
#54
I rarely take more than 30 sentences from any one thing, but even that will still be vastly faster with this tool than individually typing, capping and recording the audio for each piece manually (I know from intensely annoying experience).

cb4960 Wrote:Expect it in the next release.
Sweet, thanks.
Reply
#55
I'm normally more selective about what I put into Anki, sometimes so reviews don't pile up, but in a case like this because the effort to manually input it would be huge.
I'm about 250 cards through the deck already, and I've deleted about 20, ums and ahs and such. I'm slowly deleting the smaller sentences that come up for review the second or third time around. Some of them are quite simple and eliminating them lets me focus on the longer, harder sentences. Not sure what the final number of cards will be - likely less than half, maybe a quarter. Agreed, though, that how many you keep depends on your level, and interests.

@nest0r - collecting a specific genre is a cool idea, say if you really wanted to break into samurai or yakuza movies, a deck based on their unique words and accents would be killer.
Reply
#56
THIS IS AMAAAAAAAAZING!! OMG!! WOW!! Thank you so much for making such an invaluable program! I have been doing the Toki wo Kakeru Shoujo deck for the past hour and am so amazed at how great this is for EXPERIENCING real life speech! Once again, thank you so much. I'm going to spread the word about this to many people and hopefully we will all one day have many many decks to choose from.

****************************************************************
****************************************************************
****************************************************************
Everyone, now that we have this GREAT learning tool let us all collaborate and share what our favorite Japanese movies / shows are by sharing the Anki decks that we have created using this amazing software.

I will personally make a website specifically for all of us to have a place to display our anki decks. I'll have a very nice picture for each and every deck and a description for the movie / show that was made into an Anki deck. Think of it as Amazon.com meets Anki!

What do you guys think??!!?? Any comments or suggestions are highly appreciated!

You guys can contact me directly at crystalcastlecreature@live.com if you would like.

Thanks! And I look forward to updating you guys in a week or so, so you guys can see how the site looks.

Smile

-Jake
Reply
#57
I'm debating internally on how I'll ultimately use this. What I decided after an initial pass through it was: NO ENGLISH SUBS FILES. I looked at it with Rookies and it just seemed pointless.

Should I use it like I'm doing iKnow? I don't think so. With iKnow lists, I'm learning very specific sets of vocabulary. With those, it's about writing out the word correctly or reading it correctly.

Anyone, my current mindset when I actually get to using these is: Shadowing. I haven't tried it yet so it's bound to change. However, after I'm done with RTK3, iKnow Core 2000 and Tae Kim's advanced grammar section, I'll have more time to add on a Shadowing project.

I'll put in a request on Anki's group for a "Delete" and "Suspend" button on the main screen, visible via an options check. I figure if I'm shadowing, may as well just shadow the guys, so delete the lady's lines. I have so much material available, this should not be an issue. Yeah, I could mark the card for later deletion, but "delete" button seems useful.

Some things to look into for the subs2srs program:

1. Timing segments. There may be issues with timing of the .srt file to the video. Maybe create an advanced option to process only part of the file. This is also useful for partially processing a show to ensure setting work without waiting an entire half hour to be done.

2. Timing offsets. Plays off the above, as .srt may not be in sync with all parts of the video (commercial inserts for example), so an advanced option to move timing of .srt file forward or backward.

3. Exclude all kana/number/special character lines: I'm probably assuming too much here, but likely any line that's entirely kana and/or alphanumeric is not a line you're overly concerned with.

4. Exclude sentences with only words from this: .txt file. Here's my thinking, the program can reject duplicate lines. What if it can reject lines where all words in it are from this list you give it ie nothing new. Not sure if it's feasible. I would use it with words from iKnow for example.
Reply
#58
Nukemarine Wrote:Some things to look into for the subs2srs program:

1. Timing segments. There may be issues with timing of the .srt file to the video. Maybe create an advanced option to process only part of the file. This is also useful for partially processing a show to ensure setting work without waiting an entire half hour to be done.

2. Timing offsets. Plays off the above, as .srt may not be in sync with all parts of the video (commercial inserts for example), so an advanced option to move timing of .srt file forward or backward.

3. Exclude all kana/number/special character lines: I'm probably assuming too much here, but likely any line that's entirely kana and/or alphanumeric is not a line you're overly concerned with.

4. Exclude sentences with only words from this: .txt file. Here's my thinking, the program can reject duplicate lines. What if it can reject lines where all words in it are from this list you give it ie nothing new. Not sure if it's feasible. I would use it with words from iKnow for example.
Thanks for the suggestions.

1) & 2): I can probably get around to that someday. However, snapshot processsing would take the same amount of time no matter where you start due to the nature of the tools that I'm using.

3) Maybe as an advanced language specific option someday or somehow worked into your fourth suggestion.

4) That would be pretty easy to do. In addition, maybe an option to only process lines that contain certain words. Another advanced option could be to process the lines of only certain characters/actors - the character/actor is an optional field in the .ass subtitle format. This would be useful for shadowing purposes.
Edited: 2009-02-03, 8:28 pm
Reply
#59
I like the idea of having a blacklist (would .txt be the best option for this, though?) and a whitelist, and I also like the idea of combining the blacklist options with customizable lengths, so that you can reduce the chance of omitted, redundant kanji compounds or words being part of a longer line with new stuff, so you're not throwing the baby out with the bathwater. Would it be good to have some kind of level-checking Anki plugin? No idea what I'm talking about now, my fuzzy logic falls apart at a certain level of magnification. ;p

Also, both English and Japanese subtitles are available for a certain Urasawa film....

Sounds like a plan, btw, CCC, let us know the url. ;p I don't think it needs to be anything secretive and special, we could just use a simple blog. If music blogs can last for ages, I'm sure Anki deck blogs wouldn't be public enemy #1.
Edited: 2009-02-03, 5:48 pm
Reply
#60
cb4960 Wrote:4) That would be pretty easy to do. In addition, maybe an option to only process lines that contain certain words. Another advanced option could be to process lines the lines of only certain characters/actors - the character/actor is an optional field in the .ass subtitle format. This would be useful for shadowing purposes.
I didn't know that about .ass files, that will make my impending strategy of 'imaginary conversations' easier. (Anyone else read The Invention of Morel?)
Edited: 2009-02-03, 6:02 pm
Reply
#61
I have posted version 4.

New Stuff:

Pad Timings (milliseconds): Pad the start and end times of each line of dialog when generating an audio clip. For example, setting the start pad to 250 means that the audio clip will start 250 milliseconds sooner than it would normally. Setting the end pad to 300 means that the audio clip will end 300 milliseconds later than it would normally.

Folder browser remembers last folder.

Bug Fixes:

The frame rate had globalization issues. In some regions, 23.976 would be interpreted as 23976.0. [Thanks HerrPetersen].

The .srt parser had difficulty with multiline dialog.
Edited: 2009-02-03, 11:37 pm
Reply
#62
Everything works good now. Thanks!
Edit: I tried it on a movie with 23 fps, so I typed in 23 in the corresponding box of the program. However avs2avi used a framerate of 23.9?? and as a result 400 sound-files vs 370 pics were produced and when importing into anki the timing between pics and sound were obviously off.
Edited: 2009-02-04, 12:06 pm
Reply
#63
HerrPetersen Wrote:I tried it on a movie with 23 fps, so I typed in 23 in the corresponding box of the program. However avs2avi used a framerate of 23.9?? and as a result 400 sound-files vs 370 pics were produced and when importing into anki the timing between pics and sound were obviously off.
It sounds like Avisynth incorrectly determined the framerate of the video. What I can do is force Avisynth to use the framerate provided by the user (in this case 23 fps instead of the incorrect 23.9 fps).

Edit: In the meantime, you can cancel the snapshot processing so that I leaves behind the avs file. Edit the avs file and add something like the bolded part:

DirectShowSource("F:\TestStreams\xvid.avi", fps=23)

And then run avs2avi manualy.
Edited: 2009-02-04, 1:10 pm
Reply
#64
I have posted version 4.1 (nothing too exciting).

- Forced Avisynth to use the provided framerate.

- Added some additional validation code.
Reply
#65
Thanks for your continuing work, cb4960. I'm about to finally test this program out.
Reply
#66
do you have to make a new anki deck to use this? and i was checking out the sample decks i noticed that it didn't have translations to the words is that alright?
Reply
#67
Yes you have to have a model in anki, which allows the import of the data in the csv-file subs2srs creates. So having fields for
source, sound, pic, subs1, subs2 will do
I like to have an excel-sheet with everything I put into anki, so I add some further fields like
definitions, commentary etc.
But this is not a big deal.
Reply
#68
Hello,

I have released version 5. Some interesting new features:

1) Option to select which actors/characters you want to process. (.ass/.ssa files only)

2) Option to form words or phrases to include/exclude from processing.

3) Option to shift timings.

4) For Japanese, option to only process lines that contain at least one kanji.

Enjoy,
cb4960
Edited: 2009-02-07, 2:18 am
Reply
#69
Great stuff! Is there something that blocks the way to make ripping sound part of the process? This would cut down the amount of time you have to spend, before actually using the program.
Edited: 2009-02-07, 6:40 am
Reply
#70
HerrPetersen Wrote:Great stuff! Is there something that blocks the way to make ripping sound part of the process? This would cut down the amount of time you have to spend, before actually using the program.
Yeah, it would be nice if subs2srs would automatically rip the audio from the videos that you provide to it. That feature might be a long way off, though. I'll have to investigate audio ripping/conversion tools that I can use to aid in this process.
Reply
#71
I watched 時をかける少女 a couple of days ago and just started using your deck last night, and all I can say is... it's a piece of art! I wish I had more decks like that one, or at least more media with transcriptions to use subs2srs.

In other words, thanks a lot, your work is awesome!

Actually, I've fallen in love with the deck, learning was never this much fun!

BTW, did somebody realize that you can use sub2srs to process your favorite music?! Getting the lyric is easy, then making the timing by yourself shouldn't take more than a couple of minutes. From that, you can create an Anki deck with the audio and lyric, and go through it card by card making all the research needed to understand each line of the song (like vocabulary, sentence patterns, collocations, etc). That way, you don't only learn Japanese, but at the same time get ready for your next karaoke party! :lol:

Of course, you can add the videoclip to the deck too. It isn't like it was that hard to get them via Youtube or something. Smile
Edited: 2009-02-07, 3:16 pm
Reply
#72
Ffmpeg can rip a clip right from the video.
Code:
ffmpeg -i inputFile -ss startTime -t duration output.mp3
The times need to be in hh:mmConfuseds.SSS format.

Ffmpeg is a beast to use, but it can do just about anything. Smile
Edited: 2009-02-07, 5:30 pm
Reply
#73
Killersquierl Wrote:Ffmpeg can rip a clip right from the video.
Code:
ffmpeg -i inputFile -ss startTime -t duration output.mp3
The times need to be in hh:mmConfuseds.SSS format.

Ffmpeg is a beast to use, but it can do just about anything. Smile
I've only had time to glance at the docs so far, but I think that's what I'm looking for. Thanks for the help.

Edit1: It looks like I can use it to make snapshots too.

Edit2: I just ripped some MP3 audio with it. I ripped it from a video with an MKV container and AAC audio. It works perfectly!

Edit3: Wow, I just got it to take a single snapshot from any point within the video (unlike the current method that must run through the whole video).

Edit4: I just found something that ffmpeg can't do: Unicode. Of course, the tools I'm currently using don't support it either.
Edited: 2009-02-08, 1:38 am
Reply
#74
Yeah, no problem. I'm trying to make a port of your program to Java, so I've been looking for some alternatives to AviSynth. I've got most of the basics implemented, but I still have a some work to do before it's functional and I still need to make the GUI. The source code and picture of the GUI have been very helpful. Smile

Have you been able to make ffmpeg take only a single snapshot? Whenever I use it it makes one plus a duplicate. Not exactly a showstopper, but it is a bother.

Also, what do you mean it can't handle Unicode? Are you talking about the file names?
Reply
#75
I used this:
ffmpeg.exe -i test.mkv -ss 20.150 -s 256x144 -an -f image2 test.jpeg

I can't get Unicode file names to work. Maybe I'm using a build that wasn't compiled with Unicode support. Can you get it to work?

Edit: as I increase the start time of the snapshot, it takes longer and longer to process. So maybe I'll just use my current Avisynth solution for taking snapshots.
Edited: 2009-02-08, 2:26 am
Reply