Use subs2srs to Create Anki Decks Based on Your Favorite Movie or Show

Index » Learning resources

 
cb4960
Member
From: Los Angeles
Registered: 2007-06-22
Posts: 223

Hello,

subs2srs is a small utility that allows you to create SRS (for example Anki) import files based on your favorite foreign language movies and TV shows to aid in the language learning process. Supported platforms: Windows (using .Net), Linux/Mac (using Mono).

Download subs2srs v21.0 via SourceForge

Current Version: 21.0

Windows users will need to have the .Net Framework installed.
Linux/Mac users will need Mono and ffmpeg installed.

For Linux setup help, see the readme file.
For Mac setup help, see this post.

Have Fun!
cb4960

===========================================================
Usage File
===========================================================
subs2srs Usage

Table of Contents

    - Description
    - How to Use subs2srs
    - Importing Into Anki
    - How to Use Batch Processing in subs2srs
    - Explanation of each option found in the user interface:
        - Main Interface
        - Advanced Subtitle Options
        - Extract Audio from Media Tool
        - Dueling Subtitles Tool
        - Subtitle Style
    - Frequently Asked Questions

===========================================================
Description

subs2srs is a small utility that allows you to create SRS (for example Anki) import files based on your favorite foreign language movies and TV shows to aid in the language learning process.

This utility will parse through subtitle files, extract the dialog and timing information, and then use that information to generate audio clips, snapshots, and video clips for each line of dialog.

The following subtitle types are supported:
    - Subrip (.srt)
    - Advanced Substation Alpha (.ass, .ssa)
    - Vobsub (.idx/.sub)
    - Lyric (.lrc)

subs2srs relies on ffmpeg to extract video and thus supports most video codecs (XviD, h.264, DivX, MPEG-2, etc.) and most video containers (.avi, .mkv, .ogm, .mp4, .flv, .vob, etc.).

With all options enabled, a card will have this information generated for it (assume that it is a Japanese movie):

1) Textual line of Japanese dialog (taken from the Japanese subtitle file)
2) English translation (taken from the English subtitle file)
3) Audio clip
4) Snapshot
5) Video clip (You would not normally want to generate an audio clip AND a video clip, this is just an example).
6) Tag information (for Anki import).
7) Sequence marker (episode #, line #, and timing)
8) Context Info (previous and next lines)

Here is an example of an Anki flash card that can be generated with subs2srs:
http://a.imageshack.us/img195/2441/v21tokiwokakerushoujoan.png

Download one of the Toki wo Kakeru Shoujo sample decks:
1) Version with audio clips and snapshots (480x270). Includes context information. 1558 lines. ~68 MB. Download here.
2) Version with video clips and snapshots (both 480x270). Includes context information. 1558 lines. ~388 MB. Part 1, Part 2.

    Download other sample decks from the RTK subs2srs decks wiki.

    Download other sample decks from the Learn Any Language subs2srs decks wiki.



===========================================================
How to Use subs2srs

1) Obtain any subtitle files and multimedia files that you may need from the TV show or movie that you want to learn from.

2) Click on the Subs1… button and select a subtitle file in the language that you are trying to learn (your target language).

3) Click on the Output… button and select the directory that subs2srs can use to place the files that in generates.

4) (Optional) Click on the Subs2… button and select a subtitle file in your native language. subs2srs will match up with the subtitle file with the one that you specified previously in Subs1.

5) (Optional) Click on the Video… button and select a video file. You need the video file in order to generate audio clips, snapshots, and video clips for your flash cards.

6) If you want to generate audio clips for your flash cards, check the Generate Audio Clips checkbox.

7) If you want to generate snapshots for your flash cards, check the Generate Snapshots checkbox.

8) If you want to generate video clips for your flash cards, check the Generate Video Clips checkbox. Note: It doesn’t usually make sense to check both the Generate Audio Clips and the Generate Video Clips checkboxes.

9) In the Name of deck textbox, enter a name to use when generating various filenames. This is usually the name of the TV show or movie that you will be learning from.

10) Fill out any other option you may need. More information about these options can be found further down in this help file. Here is a screenshot showing an example setup with Audio Clips and Snapshots enabled:
http://a.imageshack.us/img529/8333/v21main.png 

11) Click the Preview… button to show a preview of what subs2srs will generate. Make sure that the Subs1 and Subs2 match up correctly.

12) Click the Go! button to begin processing. This can take a while depending on the options that you have selected.

13) After subs2srs is finished processing, open the directory that you entered for the Output… option to see the files generated by subs2srs.

14) See the next section entitled Import Into Anki to learn how to import the subs2srs files into Anki. Anki is Spaced Repetition Software (SRS)/electronic flash card software.


===========================================================
Importing Into Anki

In the section entitled How to Use subs2srs you should have learned how to use subs2srs to generate a tab-separated-value (.tsv) SRS import file (that can be used with Anki) and a media folder. All you need to do now is create a new Anki deck with the fields that you require:
    1)  Expression
    2)  Meaning
    3)  Audio clip
    4)  Snapshot
    5)  Video Clip

You can either do this yourself, or you can use the Anki deck template that is packaged with subs2srs (it is named subs2srs_template.anki and is located in the Anki Deck Template folder).

Here is a simple guide demonstrating how to import the subs2srs generated files into Anki using the provided Anki deck template.

1)  Copy the Anki deck template (subs2srs_template.anki) into the output folder.

    http://img144.imageshack.us/img144/872/setupankifolder1.png

2)  Rename the Anki deck template to match the .media folder.

    http://img186.imageshack.us/img186/2967/setupankifolder2.png

3)  Double click the newly renamed .anki file to open it. Then click on the File | Import menu option.

    http://img141.imageshack.us/img141/3310/setupankiimportmenu.png

4)  Open the .tsv import file.

http://img79.imageshack.us/img79/2699/setupankiimportopen.png

5) Open in the Import dialog, you should map each part of the import file to an Anki field. In the case of this example, the format of the import file is tag, sequence marker, audio clip, image, expression, and meaning. Now just click Import and you’re done!


http://img521.imageshack.us/img521/8995/setupankiimportmain.png

    Note: The first value in the import file is always the tag and the second value in the import file is always the sequence marker.

===========================================================
How to Use Batch Processing in subs2srs

Batch processing is the execution of multiple tasks without manual intervention. subs2srs batch processing enables the user to processes multiple episodes (for instance, of a TV show) at once. Contrast this to the method described in the How to Use subs2srs section above where you processed one episode at a time.

subs2srs allows you to perform batch processing through the use of wildcard characters. There are two wildcard characters that you may use:

Wildcard      Meaning
*                Match zero or more characters
?                Match exactly zero or one character

When working with multiple files, the files will be matched up alphabetically.

As an example of batch processing, say that you have 3 episodes of a TV show. You put your 3 subtitle files (in .srt format) and 3 video files (in .avi format) in a directory called C:\Temp. Now say that you want to create a deck with the text from the subtitle files and the corresponding audio from the video files for all 3 episodes.

To do this,

1. Enter the following into the Subs1… option: C:\Temp\*.srt
2. Enter the following into the Video… option: C:\Temp\*.avi
3. Check the Generate Audio Clips checkbox.
4. Click the Go! button. This will cause all 3 episodes to be processed at one time.

Here is a screenshot showing wildcards in use:
http://a.imageshack.us/img801/3505/v21mainwildcard.png


===========================================================
The Main Interface

http://a.imageshack.us/img841/1930/v21anomain.png

1)Menu
File | New:  Reset all fields to default values.
File | Open…:  Restore previous interface state.
File | Save…:  Save current state of the interface (the actor list will not be saved).
File | Exit:  Exit subs2srs
Tools | Extract Audio from Media…:  Show the “Extract Audio from Media” tool.
Tools | Dueling Subtitles Tool…:  Show the “Dueling Subtitles” tool.
Help | Usage…:  Show this file
Help | About…:  Display information about subs2srs.

2) Subs1: The substitle file(s) in the language that you are trying to learn (your target language). For example, the Japanese subtitles for a TV show. The file(s) may be in .srt, .ass, .ssa, .lrc and/or Vobsub (.idx/.sub) format.

Note 1: The contents of any text-based subtitle file (.srt, .ass, .ssa, .lrc) must have a UTF-8 or ASCII encoding.
   
You may use the following wildcards in order to specify multiple files:
    * = Match zero or more characters
    ? = Match exactly zero or one character

Example: enter C:\Temp\*.srt to use all .srt files in C:\Temp.

Note 2: When working with multiple files, the files will be matched up alphabetically.

3) [Subs1] Subtitle Stream: The stream to use with the Vobsub (.idx/.sub) subtitles in Subs1. This option is only enabled when there is a Vobsub (.idx/.sub) file in Subs1.

Note: A single Vobsub (.idx/.sub) can have multiple streams (one for English, one for Japanese, etc.). Because of this, you can use the same subtitle file for Subs1 and Subs2 but choose different streams for each.

4) Output: Directory where the generated files will be placed. The following files and directories will be generated here:
       /Media directory
       All audio clips
       All snapshots
       All video clips
       All image text from a vobsub
       SRS (ex. Anki) input file (a .tsv file - tab separated values)

5) Subs2: The corresponding subtitle file(s) in your native language. For example, the English subtitles for a movie. The file(s) may be in .srt, .ass, .ssa, .lrc and/or Vobsub (.idx/.sub) format. This field is optional. Leave blank if you do not want or have the corresponding set of subtitles.

Note 1: The contents of any text-based subtitle file (.srt, .ass, .ssa, .lrc) must have a UTF-8 or ASCII encoding.
   
You may use the following wildcards in order to specify multiple files:
    * = Match zero or more characters
    ? = Match exactly zero or one character

Example: enter C:\Temp\*.srt to use all .srt files in C:\Temp.

Note 2: When working with multiple files, the files will be matched up alphabetically.

Note 3 (important): Subs1 is compared against Subs2 and not the other way around.

This means that if a subtitle file from Subs1 contains 300 lines and a subtitle file from Subs2 contains 310 lines, the maximum number of lines that will be processed is 300 (the number of lines from Subs1).

Of course, this also means that if a subtitle file from Subs1 contains 310 lines and a subtitle file from Subs2 contains only 300 lines, then 310 lines will be processed and 10 of those lines will be mismatched. See following note.

6) [Subs2] Subtitle Stream: The stream to use with the Vobsub (.idx/.sub) subtitles in Subs2. This option is only enabled when there is a Vobsub (.idx/.sub) file in Subs2.

Note: A single Vobsub (.idx/.sub) can have multiple streams (one for English, one for Japanese, etc.). Because of this, you can use the same subtitle file for Subs1 and Subs2 but choose different streams for each.

7) Video: The video file(s) that correspond to the subtitle file(s). Videos may use any format supported by ffmpeg (.avi, .mkv, etc.).
   
    Note 1: Video filenames must not contain any Unicode characters
   
You may use the following wildcards in order to specify multiple files:
    * = Match zero or more characters
    ? = Match exactly zero or one character

Example: enter C:\Temp\*.avi to use all .avi files in C:\Temp.

Note 2: When working with multiple files, the files will be matched up alphabetically.

8) Audio Stream: Some videos contain multiple audio streams/tracks. This option allows you to select which audio to use when making audio clips and video clips.

9) Use Timings From: Subtitles contain timing information that determines when they appear and for how long. You may select to use the timing information of Subs1 or Subs2. The timings are used for generating the snapshots, audio clips, and video clips. This option may be useful if one of your subtitles was specifically timed for the specific video source that you are using.

10) Span (h:mm:ss): Only process lines that start within the specified span of time. Span is applied after the time shift is applied.

11) Pre-Time Shift: Shift Subs1 or Sub2 dialog timings before the lines are processed. This is the equivalent to using an external program (such as Aegisub) to apply a time shift to the Subs1 or Subs2 subtitle file. Use this to reduce the gap between the Subs1 and Subs2 timings. This will reduce errors when processing/matching subtitle lines.

12) Post-Time Shift: Shift timings forward or backward by the provided number of milliseconds after the lines have been processed. For example, setting the shift to -1000 will make all subtitles start (and end) 1 second sooner.

Note: This is an option of convenience for the user as the same result can be achieved by adjusting the Pre-Time Shift option for both Subs1 and Subs2.

13) More Options…: Show the advanced subtitle options dialog (see the below section for more information).

14) Generate Audio Clips: Enable/Disable the generation of audio clips.

15) Source: Select where to get the audio tracks from. You have two options here. You can either have subs2srs automatically extract the audio from the video file(s) at the specified bitrate or you can provide corresponding .mp3 audio tracks for each subtitle file. That is, one audio track for each episode of a TV show. The file(s) must be in .mp3 format.

You may use the following wildcards in order to specify multiple files:
    * = Match zero or more characters
    ? = Match exactly zero or one character

Example: enter C:\Temp\*.mp3 to use all .mp3 files in C:\Temp.

Note 1: When working with multiple files, the files will be matched up alphabetically.

Note 2: Automatic audio extraction has been successfully tested with audio tracks that are in the following formats: AC3, AAC, VORBIS, and MP3. Make sure that your AC3 file does not have DRM protection.

Note 3: If you choose to supply the .mp3 file(s), the filenames within must not contain any Unicode characters.

16) [Audio Clip] Pad Timings: Pad the start and end times of each line of dialog when generating an audio clip. For example, setting the start pad to 250 means that the audio clip will start 250 milliseconds sooner than it would normally. Setting the end pad to 300 means that the audio clip will end 300 milliseconds later than it would normally.

17) Generate Snapshots: Enable/Disable the generation of snapshots. All snapshots will be taken at the halfway point of the dialog.
   
18) [Snapshot] Dimensions: Width and height of the generated snapshots (in pixels). You do not need to compensate for the crop. subs2srs will automatically resize as appropriate in order to compensate for the crop.

When you click the “>” button, the following dialog will appear to help you choose the snapshot dimensions to use based on a percentage of the actual video size.

http://img843.imageshack.us/img843/9625/v20dimensionschooser.png
       
19) Crop Bottom: A crop may be applied to the bottom of the snapshot. This might to useful for removing hard subbed subtitles. The crop is applied against the original resolution of the video and not the resized resolution. Enter 0 if you do not wish to crop.

20) Generate Video Clips: Enable/Disable the generation of video clips. All generated video clips use XviD for the video and MP3 for the audio. These are stuffed into an AVI container.

21) [Video Clip] Dimensions: Width and height of the generated video clips (in pixels). You do not need to compensate for the crop. subs2srs will automatically resize as appropriate in order to compensate for the crop.

When you click the “>” button, the following dialog will appear to help you choose the video clip dimensions to use based on a percentage of the actual video size.

http://img843.imageshack.us/img843/9625/v20dimensionschooser.png

       
22) Crop Bottom: A crop may be applied to the bottom of the video clip. This might to useful for removing hard subbed subtitles. The crop is applied against the original resolution of the video and not the resized resolution. Enter 0 if you do not wish to crop.

23) Bitrates: The bitrate to use for audio and video when generating video clips. Higher bitrates mean higher quality at the expense of larger file sizes.

24) [Video Clip] Pad Timings: Pad the start and end times of each line of dialog when generating a video clip. For example, setting the start pad to 250 means that the video clip will start 250 milliseconds sooner than it would normally. Setting the end pad to 300 means that the video clip will end 300 milliseconds later than it would normally.

25) Name of Deck: An arbitrary name to associate with your deck. It will be used in the filenames of the generated files. Any space characters will be converted to an underscore.

Note 1: You cannot use the following characters (as Windows will not allow these in filenames): \ / : * ? “ < > |
   
    Note 2: You cannot use any Unicode characters.

26) First Episode: The first number to use when generating filenames, tags, and sequence markers. If you wanted to start at episode 3 (that is, the files for episode 1 and episode 2 are not in the provided directories), then put a 3 in this box.

27) Preview: Show the Preview dialog (see the below section for more information).

28) Go!: Start processing the subtitles. subs2srs will warn you if you have made any input errors (a red icon will appear next to each field that contains an error). A progress bar will appear for each step (subtitle processing, Anki import file generation, audio clip generation, snapshot generation, and video clip generation). During some of these steps, a command prompt will appear that will inform you of progress.



===========================================================
The Advanced Subtitle Options Interface

http://img685.imageshack.us/img685/2236/v17anoadvprunesubs1.png

1)  Open From File (Include): Open a file containing a semicolon separated list of words or phrases to use as the include list.

2)  Include List: Enter a semicolon separated list of words or phrases here. In order to be processed, a line of dialog will have to contain at least one of these words or phrases. This can be helpful if you are trying to get examples of certain words that you are studying. Space characters will not be stripped out. Has no effect when using Vobsubs (.idx/.sub).

3)  Open From File (Exclude): Open a file containing a semicolon separated list of words or phrases to use as the exclude list.

4)  Exclude List: Enter a semicolon separated list of words or phrases here. In order to be processed, a line of dialog must not contain any of these words or phrases. Space characters will not be stripped out. Has no effect when using Vobsubs (.idx/.sub).

5)  Remove Duplicates From: Remove duplicate lines of dialog from either Subs1 or Subs2. When enabled, only the first instance of a line will be used. Has no effect when using Vobsubs (.idx/.sub).

6)  Exclude lines with fewer than # characters: You may set a minimum character length for a line of dialog.  Any line not meeting the minimum character length will not be processed. The purpose is to help eliminate easy/trivial lines. This feature may be Enabled/Disabled. Has no effect when using Vobsubs (.idx/.sub).

7)  Exclude lines shorter than # milliseconds: You may set a minimum length for a line in milliseconds. Any line whose timing not meeting the minimum number of milliseconds will not be processed.

8)  Exclude lines longer than # milliseconds: You may set a maximum length for a line in milliseconds. Any line whose timing exceeds the maximum number of milliseconds will not be processed.

http://img40.imageshack.us/img40/4915/v17anoadvcontext.png

1)  Number of lines leading up to a line of dialog: The number of lines before a line of dialog to attach to a fact.

2)  Include audio clip/snapshot/video clip information: You may choose which information from the leading lines that you would like to appear in the .tsv import file. For example, if would only like to import the audio clip information from a leading lines, than check only that option. This feature makes the .tsv import file less cluttered with information that is not important to you and makes it easier to import into Anki.

        Note: The Subs1 text and Subs2 text of the leading lines is always placed into the .tsv file.

3)  Number of lines trailing a line of dialog: The number of lines after a line of dialog to attach to a fact.

4)  Include audio clip/snapshot/video clip information: You may choose which information from the trailing lines that you would like to appear in the .tsv import file. For example, if would only like to import the audio clip information from a trailing lines, than check only that option. This feature makes the .tsv import file less cluttered with information that is not important to you and makes it easier to import into Anki.

       Note: The Subs1 text and Subs2 text of the trailing lines is always placed into the .tsv file.

http://img268.imageshack.us/img268/1599/v13anoactors.png

1)  Use Actors From: Use the actor/character names found in the .ass/.ssa files of either the Subs1 or Subs2 directory.

2)  Check Actors: Scan the .ass/.ssa files of the selected subtitle directory for actor/character names. If any names are found, they will be listed to the right. Since the actor/character name field is optional, not all .ass/.ssa files will have actor/character names associated with each line of dialog.

3)  Available Actors: The list of available actor/character names. Only lines that are associated with one of the selected names will be processed. For example, if you only want to hear the lines of the character "Makoto", deselect all other names and then select "Makoto" from the list. If no actors are selected, processing will take place as normal.

http://img82.imageshack.us/img82/257/v13anoadvlang.png

1)  Only process lines containing kanji: If selected, only lines containing one or more kanji will be processed.

http://img84.imageshack.us/img84/3893/v13anovobsub.png/url]

1)  Enable Custom Colors: Use custom colors when processing images extracted from Vobsub (.idx/.sub) files.

2)  Colors: Select a color to use for each part of the subtitle.

3)  Transparent: Select whether or not a part of the subtitle should be transparent. The color is ignored when this is enabled.

4)  Reset: Reset the colors and transparency settings to their defaults.

===========================================================
The Preview Interface

http://a.imageshack.us/img830/4858/v21anopreview.png

For Linux/Mac users: If the text shows box characters, please do the following:
  1. Open settings.txt (it’s in the subs2srs.exe directory)
  2. Find the “default_preview_east_asain_font” setting
  3. Set the “default_preview_east_asain_font” to a valid font name.

Overview:

The Preview interface will show you a list of the lines that may be processed by subs2srs when the Go! button is pressed. After entering the desired settings you can use this preview to verify that subs2srs is correctly matching lines between Subs1 and Subs2, as well as verify that the audio timings and snapshots are correct. If something doesn’t seem right to you, just modify the settings and click the Regenerate Preview button.

The preview will also show you which lines will be processed (and end up in the Anki export file) and which lines will be discarded. subs2srs provides this capability with the concept of active and inactive lines. Active lines (shown in pale green) will be processed by subs2srs and inactive lines (shown in pink) will be discarded. With the default settings, all lines are set to active. The active state of the lines can be affected by various settings such as the Span settings in the Main interface or the Prune settings in the Advanced Subtitle Options interface. In addition, you may hand pick exactly which lines are active and which lines are inactive with the Activate and Deactivate buttons.

For really pesky lines, you may use this interface to hand edit the text for each line. However, for the time being, you may not edit the timings for each individual line.

Note of caution: By clicking the Regenerate Preview button or exiting the Preview dialog, you will lose all hand edits that you have made to the text and to the active state for each line. In order to prevent accidents, subs2srs will warn you if you are about to wipe out your edits. It is recommended that you adjust the settings first and then hand edit the text and the active state.

1)    Episode: Allows you to select the episode to preview.

2)    Find: Allows you to search for the provided text in the current episode. The search will start from the currently selected line and will wrap around to the beginning if necessary. The search is not case-sensitive. Wild cards and regular expressions are not supported at this time.

    Special search options:

    Begin the search text with “a:” to search only active items. Example search: “a:search text”. Omit the search text to search for the next active item.

    Begin the search text with “i:” to search only inactive items. Example search: “i:search text”. Omit the search text to search for the next inactive item.

3)    Statistics Box: Allows you to see the number of lines in the current episode and the total number of lines in all episodes. It includes a breakdown of the number active and inactive lines.

4)    List of Lines: Displays each Subs1 line and its corresponding Subs2 line. Active lines are drawn in pale green and inactive lines are drawn in pink. This is a good place to make sure that subs2srs is matching up the lines correctly. If the lines aren’t matching very well, try adjusting the “Pre-Time Shift” in the Main interface and click the “Regenerate Preview” button to update the preview. Select a line from the list in order to preview its corresponding audio clip and snapshot (see below).

    Note: If you checked the snapshot preview box, expect a short delay when selecting a line.

5)    Select All: Select all lines in the list.

6)    Select None: Deselect all lines in the list.

7)    Invert: Invert the selected lines in the list.

8)    Activate: Make the selected lines active. subs2srs will only process lines marked as active. Active lines have a pale green background.

9)    Deactivate: Make the selected lines inactive. Inactive lines will not be processed by subs2srs and will not be placed into the Anki export file. Inactive lines have a pink background.

10)  Snapshot Preview Checkbox: Check this box to generate a preview of the snapshot. Generating a snapshot preview will introduce a slight delay when selecting lines.

11)  Snapshot Preview: A thumbnail preview of the snapshot will be generated for this line. This is good place to check the Crop setting from the Generate Snapshots section of the Main interface. The thumbnail preview will scale to preserve the correct aspect ratio. Click on the thumbnail preview to display the un-scaled, actual-size snapshot based on the Dimensions setting.

12)  Subs1 Text:  The text of Subs1. You may edit the text if you so desire. For Vobsub (.idx/.sub) subtitles, the actual Vobsub image will be displayed here (unlike in the list). This is a good place to test custom Vobsub colors.

13)  Subs2 Text: The text of Subs2. You may edit the text if you so desire. For Vobsub (.idx/.sub) subtitles, the actual Vobsub image will be displayed here (unlike in the list). This is a good place to test custom Vobsub colors.

14) Preview Audio:  Play the audio clip associated with the currently selected line. This is a good place to make sure that your Post-Time Shift , Pad Timings, and Bitrate settings are acceptable.

    Note: Expect a short delay the first time you press the button after selecting a line. The audio is being extracted/cut during this delay. Subsequent clicks should be instantaneous until a new line is selected.

15)  Regenerate Preview: Each time you finish making updates to the settings (for example, to change the pad for the audio timings) you should click this button to update the preview with the latest settings.

Warning: By pressing this button you will lose all hand edits that you have made to the text and to the active state for each line.

16)  Go!: Start processing the subtitles. Only lines marked as active (the pale green lines) will be processed.

How to enable the hidden video preview option: In order to enable the video preview button on the video preview interface, open the settings.txt files (in the subs2srs.exe directory) and fill out the video_player and video_player_args options.

Example:

video_player = C:\Program Files\vlc\vlc.exe

video_player_args = --one-instance --start-time=${s_total_sec} --stop-time=${e_total_sec} --video-x=0 --video-y=0 --width=${width} --height=${height} --video-title=${s_hour}:${s_min}:${s_sec}

The tokens (things like ${s_total_sec}) that you may use are described at the top of the settings.txt file.


===========================================================
The Extract Audio from Media Tool Interface

http://img836.imageshack.us/img836/3808/v20anoextractaudiotool.png

Overview:

Use this tool to extract, convert, and split the audio track from a media file. You may extract the audio track as a single clip or break the audio track into multiple clips of a provided length. The audio tract will be extracted in mp3 format.

1) Media: The media file(s) to extract, convert, and split.

Note 1: The filenames within must not contain any Unicode characters.
   
You may use the following wildcards in order to specify multiple files:
    * = Match zero or more characters
    ? = Match exactly zero or one character

Example: enter C:\Temp\*.avi to use all .avi files in C:\Temp.

Note 2: When working with multiple files, the files will be matched up alphabetically.

2) Audio Stream: Some videos contain multiple audio streams/tracks. This option allows you to select which audio to use.

3) Output: The directory where the generated .mp3 files will be placed.

4)    Span (h:mm:ss): Only process lines that start within the specified span of time.

5)  Bitrate: The bitrate of the extracted mp3 files. Higher bitrates mean higher quality at the expense of larger file sizes

6)  Format: You have two options here: you can either extract the entire audio track as a single clip or you can break the audio track into multiple clips. If you choose the latter, can must specify the length of each clip (in h:mm:ss) format.

7) Lyrics:  Enable/Disable adding lyrics to each generated .mp3's ID3 Lyrics tag. The lyrics are based off of subtitle files. The timestamps in the lyrics are relative to the start of the clip.

8) Subs1:  The subtitle files(s) in the language that you are trying to learn (your target language) to use in the lyrics. For example, the Japanese subtitles for a TV show. The file(s) may be in .srt, .ass, .ssa, and/or .lrc format.

Note 1: The contents of any text-based subtitle file (.srt, .ass, .ssa, .lrc) must have a UTF-8 or ASCII encoding.
   
You may use the following wildcards in order to specify multiple files:
    * = Match zero or more characters
    ? = Match exactly zero or one character

Example: enter C:\Temp\*.srt to use all .srt files in C:\Temp.

Note 2: When working with multiple files, the files will be matched up alphabetically.

9) Subs2:  The corresponding subtitle file(s) in your native language for use in the lyrics. For example, the English subtitles for a movie. The file(s) may be in .srt, .ass, .ssa, and/or .lrc format. This field is optional. Leave blank if you do not want or have the corresponding set of subtitles.

Note 1: The contents of any text-based subtitle file (.srt, .ass, .ssa, .lrc) must have a UTF-8 or ASCII encoding.

You may use the following wildcards in order to specify multiple files:
    * = Match zero or more characters
    ? = Match exactly zero or one character

Example: enter C:\Temp\*.srt to use all .srt files in C:\Temp.

Note 2: When working with multiple files, the files will be matched up alphabetically.

Note 3 (important): Subs1 is compared against Subs2 and not the other way around.

This means that if a subtitle file from Subs1 contains 300 lines and a subtitle file from Subs2 contains 310 lines, the maximum number of lines that will be processed is 300 (the number of lines from Subs1).

Of course, this also means that if a subtitle file from Subs1 contains 310 lines and a subtitle file from Subs2 contains only 300 lines, then 310 lines will be processed and 10 of those lines will be mismatched. See following note.

10) Span (h:mm:ss):  Only process lines that start within the specified span of time. Span is applied after the time shift is applied

11) Pre-Time Shift:  Shift Subs1 or Sub2 dialog timings before the lines are processed. This is the equivalent to using an external program (such as Aegisub) to apply a time shift to the Subs1 or Subs2 subtitle file. Use this to reduce the gap between the Subs1 and Subs2 timings. This will reduce errors when processing/matching subtitle lines.

12) Post-Time Shift: Shift timings forward or backward by the provided number of milliseconds after the lines have been processed. For example, setting the shift to -1000 will make all subtitles start (and end) 1 second sooner.

Note: This is an option of convenience for the user as the same result can be achieved by adjusting the Pre-Time Shift option for both Subs1 and Subs2.

13) Name:  An arbitrary name to associate with the filenames of the generated files. Any space characters will be converted to an underscore.

Note 1: You cannot use the following characters (as Windows will not allow these in filenames): \ / : * ? “ < > |
   
    Note 2: You cannot use any Unicode characters.

14) First Episode:  The first number to use when generating filenames. If you wanted to start at episode 3 (that is, the video files for episode 1 and episode 2 are not in the provided directories), then put a 3 in this box.

15) Extract Audio:  Start the audio extraction process.

Note: Audio extraction has been successfully tested with audio tracks that are in the following formats: AC3, AAC, VORBIS, and MP3. Make sure that your AC3 file does not have DRM protection.


===========================================================
The Dueling Subtitles Tool Interface

http://a.imageshack.us/img192/7646/v201anoduelingsubtitles.png[/url]

Overview:

Use this tool to create subtitle files (in .ass format) that will simultaneously display a line from Subs1 and its corresponding line from Subs2.  Here are some screenshots that demonstrate the Dueling Subtitles feature:

This screenshot shows the default style settings:
http://img39.imageshack.us/img39/135/sampleduelingsubtitlest.png

This screenshot shows Subs2 displayed at the top of the screen with smaller font and outlined in light brown with no shadow:
http://img96.imageshack.us/img96/8959/sampleduelingsubtitless.png

These subtitles can be used in any video player that accepts .ass subtitle files. In order to display the subtitles, it is usually as simple as renaming the subtitles to match the name of the video file and then opening the video file or finding your player's "open subtitles" feature. This feature might be useful for those who are hesitant about viewing only the foreign language subtitles for fear that they might not get as much enjoyment out of it. And for those who want to instantly check their listening comprehension.

1) Subs1: The subtitle files(s) in the language that you are trying to learn (your target language). For example, the Japanese subtitles for a TV show. The file(s) may be in .srt, .ass, and/or .ssa format.

Note 1: The contents of any text-based subtitle file (.srt, .ass, .ssa) must have a UTF-8 or ASCII encoding.

You may use the following wildcards in order to specify multiple files:
    * = Match zero or more characters
    ? = Match exactly zero or one character

Example: enter C:\Temp\*.srt to use all .srt files in C:\Temp.

Note 2: When working with multiple files, the files will be matched up alphabetically.

2) Subs2: The corresponding subtitle file(s) in your native language for use in the lyrics. For example, the English subtitles for a movie. The file(s) may be in .srt, .ass, and/or .ssa format. This field is optional. Leave blank if you do not want or have the corresponding set of subtitles.

Note 1: The contents of any text-based subtitle file (.srt, .ass, .ssa) must have a UTF-8 or ASCII encoding.

You may use the following wildcards in order to specify multiple files:
    * = Match zero or more characters
    ? = Match exactly zero or one character

Example: enter C:\Temp\*.srt to use all .srt files in C:\Temp.

Note 2: When working with multiple files, the files will be matched up alphabetically.

Note 3 (important): Subs1 is compared against Subs2 and not the other way around.

This means that if a subtitle file from Subs1 contains 300 lines and a subtitle file from Subs2 contains 310 lines, the maximum number of lines that will be processed is 300 (the number of lines from Subs1).

Of course, this also means that if a subtitle file from Subs1 contains 310 lines and a subtitle file from Subs2 contains only 300 lines, then 310 lines will be processed and 10 of those lines will be mismatched. See following note.

This means that if a subtitle file from Subs1 contains 300 lines and a subtitle file from Subs2 contains 310 lines, the maximum number of lines that will be processed is 300 (the number of lines from Subs1).

Of course, this also means that if a subtitle file from Subs1 contains 310 lines and a subtitle file from Subs2 contains only 300 lines, then 310 lines will be processed and 10 of those lines will be mismatched. See following note.

3) Output: The directory where the generated files will be placed.

4) Use Timings From: Subtitles contain timing information that determines when they appear and for how long. You may select to use the timing information of Subs1 or Subs2. The timings are used for combining lines from Subs1 and Subs2. This option may be useful if one of your subtitles was specifically timed for the specific video source that you are using.

5) Pre-Time Shift: Shift Subs1 or Sub2 dialog timings before the lines are processed. This is the equivalent to using an external program (such as Aegisub) to apply a time shift to the Subs1 or Subs2 subtitle file. Use this to reduce the gap between the Subs1 and Subs2 timings. This will reduce errors when processing/matching subtitle lines.

6) Post-Time Shift: Shift timings forward or backward by the provided number of milliseconds after the lines have been processed. For example, setting the shift to -1000 will make all subtitles start (and end) 1 second sooner.

Note: This is an option of convenience for the user as the same result can be achieved by adjusting the Pre-Time Shift option for both Subs1 and Subs2.

7) Subs1 Style: Adjust the styling of a Dueling Subtitle’s Subs1 lines.

8) Subs2 Style: Adjust the styling of a Dueling Subtitles’s Subs2 lines.

9) Alignment Priority: When Subs1 and Subs2 both have the same alignment style, this option determines which one gets priority. For example:  If both Subs1 and Subs2 are aligned to the bottom of the screen, set this value to "Subs2" to force Subs2 to be drawn below Subs1.

10) Create a dueling subtitle every X lines: Determine the sequence at which a line from Subs2 is displayed simultaneously with the line from Subs1. Setting this value greater than one acts as a sort of "hint". For example: Set this value to 3 to display the corresponding line from Subs2 every 3 subtitle lines. The sequence would be:

1: Subs1 Line
2: Subs1 Line
3: Subs1 Line + Subs2 Line
4: Subs1 Line
5: Subs1 Line
6: Subs1 Line + Subs2 Line
Etc…
   
11) Also generate quick reference .txt file: In addition to the subtitle file, also generate a quick reference .txt file with just the Subs1 and Subs2 text. Can be useful for following the dialog and looking up unknown words.


12) Name of Deck: An arbitrary name to associate with the filenames of the generated files. Any space character will be converted to an underscores.

Note: You cannot use the following characters (as Windows will not allow these in filenames): \ / : * ? " < > |

13) First Episode: The first number to use when generating filenames. If you wanted to start at episode 3 (that is, the files for episode 1 and episode 2 are not in the provided directories), then put a 3 in this box.

14) Create Dueling subtitles!: Process each of the Subs1 and Subs2 subtitles files and combine them to create corresponding Dueling Subtitles in .ass format.


===========================================================
The Subtitle Style Interface

http://img190.imageshack.us/img190/4756/v18anoduelingsubtitless.png

Overview:

This interface allows you to modify the Subs1 or Subs2 subtitle style when generating Dueling Subtitles. It should be somewhat familiar to those who use Aegisub.

1)  Font:  Launches the standard font chooser dialog which allows you to choose the font and its size. It also allows you to select the bold, italic, underline, and strikeout settings.

2)  Primary Color:  Select the primary color and its opacity. Opacity ranges from 0 (opaque) to 255 (transparent). This is the color that a subtitle will normally appear in.

3)  Secondary Color:  Select the secondary color and its opacity. Opacity ranges from 0 (opaque) to 255 (transparent). This color may be used instead of the Primary color when a subtitle is automatically shifted to prevent an onscreen collision, to distinguish the different subtitles.

       Note: In practice, I don't think that I have ever seen the Secondary color used by a video player.

4)  Outline Color: Select the outline color and its opacity. Opacity ranges from 0 (opaque) to 255 (transparent).

5)  Shadow Color: Select the shadow color and its opacity. Opacity ranges from 0 (opaque) to 255 (transparent).

6)  Outline: The width (in pixels) of the outline that surrounds the text.

7)  Shadow: The depth (in pixels) of the shadow behind the text.

8)  Opaque box: Choose to display a box behind the subtitles.

9)  Alignment: Choose how text is aligned on the screen.

10)  Left Margin: Define the left margin in pixels. It is the distance from the left-hand edge of the screen. The three onscreen margins define areas in which the subtitle text will be displayed.

11)  Right Margin: Define the right margin in pixels. It is the distance from the right-hand edge of the screen. The three onscreen margins define areas in which the subtitle text will be displayed.

12)  Vertical Margin: Define the vertical margin in pixels. It is the distance from the top/bottom edge of the screen. The three onscreen margins define areas in which the subtitle text will be displayed.

13)  Scale X: Percent to modify the width of the text.

14)  Scale Y: Percent to modify the height of the text.

15)  Rotation: Number of degrees to rotate the text. The origin of the rotation is defined by the alignment.

16)  Spacing: Extra space between characters (in pixels).

17)  Encoding: This specifies the font character set or encoding and on multi-lingual Windows installations it provides access to characters used in multiple than one language. You can probably ignore this setting.

===========================================================
Frequently Asked Questions

1) Why do certain characters from my text-based subtitles look strange when I view them in the Preview Interface or when I import them into Anki?

subs2srs  can only handle text-based subtitle files encoded in UTF-8 format. To solve your issue, you need to convert your subtitles from their existing encoding to UTF-8 encoding. The easiest way is to open it in Microsoft Word and save it as a UTF-8 encoded file. If you don't have Word, try using Notepad++ instead:

  Step 1: Open the non-UTF8 encoded subtitle file in Notepad++.
  Step 2: From the menu select “Encoding | Character Sets | <<navigate to the character set that your subtitle file is in>>”.
  Step 3: From the menu select “Edit | Select All” (to highlight all of the text).
  Step 4: From the menu select “Edit | Copy” (to copy the text).
  Step 5: From the menu select “File | New” (to open a new tab).
  Step 6: From the menu select “Encoding | Encode in UTF-8” (to set UTF-8 encoding).
  Step 7: From the menu select “Edit | Paste” (to paste the text into the new file).
  Step 8: From the menu select “File | Save”.

2) Why don’t some of the lines from my text-based subtitles get processed? Also why don’t some lines from my subtitle file look exactly the same after being processed?

First, see the notes in the “5) Subs2…” part of the Main Interface section. To add to that, subs2srs tries to fix mismatched lines by combining one or more lines to form a better match.

It could also be that the subtitle parser is removing/modified lines:

Subrip (.srt)
• The italic, bold, and underline style tags will be removed.
• A space will be added in between each line of a multiline subtitle.

Advanced Substation Alpha (.ass/.ssa)
• Lines beginning with a “{“ will not be processed because its probably part of an unwanted karaoke effect.
• “\N” style newlines will be replaced with a space.
• Embedding styles (anything between two curly braces) will be removed.
• Blank lines will be removed.

Lyric (.lrc)
• Lines with the same start and end times will be removed because it’s probably metadata or an advertisement.
• Lines containing a colon will be removed because it’s probably metadata.
• Lines containing a “www.“ will be removed because it’s probably in advertisement.
• Blank lines will be removed.

3) I’m getting this error message: “Failed to extract the audio from the video. Make sure that the video does not have any DRM restrictions.” What did I do wrong?

You haven’t done anything wrong. subs2srs relies on a tool called ffmpeg to do all of its video and audio processing. ffmpeg is a great tool, but it’s not perfect and sometimes it will refuse to process the audio (especially multi-channel audio). You can work around this by extracting the audio for the video file, converting the audio to mp3 format and then providing that mp3 to subs2srs (see the “15) Source” part of the Main Interface section).

4) Are there any command line options?

Just one. The first argument may be the path to a .s2s file that you want load upon startup. Note: You can create a .s2s with the “File | Save…” menu option.

5) What’s up with that settings.txt file in the subs2srs directory? Can I modify it?

settings.txt is the settings file for subs2srs. The settings file contains a variety of settings related to program defaults, output formats, and also contains a few hidden options whose use wasn’t quite widespread enough to put into the graphical user interface. Each setting is fairly well documented in the settings file itself. Also be sure to read the note about tokens at the top of the settings file. Many of the settings use these tokens.

6) Where can I find foreign language subtitles?

I only know of these resources:
http://www.d-addicts.com/forum/subtitles.php has a large amount of subtitles for Asian dramas. It also has English subtitles.
http://kitsunekko.net/subtitles/japanese/ has a very large amount of subtitles for Anime.

7) What tool can I use to modify subtitle timings?

Use Aegisub (http://www.aegisub.org/).

8) What tool(s) can I use to extract subtitles from an MKV file?

Use MKVExtractGUI (http://www.videohelp.com/tools/MKVExtractGUI). It requires mkvtoolnix (http://www.bunkus.org/videotools/mkvtoo … loads.html).

9) How can I extract subtitles from a DVD?

To convert DVD subtitles to Vobsub subtitles, use VSRip (http://www.videohelp.com/tools/VSRip).

10) Why does subs2srs create .png files instead of text when I provide subtitles in Vobsub (.idx/.sub) format?

Vobsub (.idx/.sub) is an image-based subtitle format, not a text-based subtitle format. subs2srs merely extract the images and converts them to a format that can be used with Anki (.png).

If you want to convert Vobsubs to text-based subtitle format (like .srt), you will need to use some sort of OCR software such as SubRip (http://www.videohelp.com/tools/Subrip).

11) I have a problem with subs2srs and nothing in this file has helped me to resolve it. What is the best way to get my problem resolved?

E-mail me at <<my user name>> 'at' 'gmail' 'dot' 'com'. Be sure to attach the subtitle files and, if applicable, the generated .tsv file.

12) I have feature suggestion, are you willing to entertain it?

Sure, post the suggestion in the feedback thread at http://forum.koohii.com/viewtopic.php?id=2643&p=1. Most of the features currently in subs2srs were originally suggestions from the community so don’t hesitate.

Last edited by cb4960 (September 05, 12:21 am)

Ryuujin27
Member
Registered: 2006-12-14
Posts: 594

Well done, cb4960! I must say this program looks wonderful! I haven't experimented much with putting audio/pictures/etc into anki yet, but this program makes me want to try it!

I've seen a ton of shows and movies that would have some excellent lines to know, but I would usually just add them as plain text with no other accompanying items (besides a reference to where it came from). However, this seems like a much better way to do that, and quite simple too!

Thanks a lot! As soon as I get a chance (not soon) I'll try it out, but I'm sure someone will beat me to that.

Omnistegan
Member
From: Alberta Canada
Registered: 2009-01-10
Posts: 31

Very nicely done, it's too bad Japanese subs are still hard to come by.
I'll certainly be using this where I can.

stoked
Member
From: Switzerland
Registered: 2009-01-09
Posts: 378
Website

I'd love to see a linux port of this. Cool stuff.

Omnistegan
Member
From: Alberta Canada
Registered: 2009-01-10
Posts: 31

stoked wrote:

I'd love to see a linux port of this. Cool stuff.

ooo, yes, I forgot to say that. A linux port would be amazing.

nest0r
Member
Registered: 2007-10-19
Posts: 3000

Thanks! This looks a lot easier.

gilozoaire
Member
From: BeerLand
Registered: 2008-06-16
Posts: 20

If you can, just virtualize windows with vmware or virtualbox... It works incredibly well

I'm going to try that program very soon big_smile

Thanks a lot for make it simpler for mere mortals wink

mistamark
Member
From: Japan
Registered: 2008-03-26
Posts: 115
Website

cb4960 wrote:

Hello,

I have created a new version of subs2srs. See the usage file (attached below) to get an idea of what you can do with this utility.

This is fully awesome! Will you ever make the source code available ?? [An interested developer]

radical_tyro
Member
Registered: 2005-11-19
Posts: 225

cb4960, this is just too awesome. seriously, my hats off to you, pal.

timcampbell
Member
From: 北京
Registered: 2007-11-04
Posts: 187

Just want to say I'm 100 cards into this deck already and the file works brilliantly. It's such a great tool, with maybe the best potential of any I've used. It provides natural native sound files, with full japanese subs so you can look up words/grammar you don't know, and offered in sentence by sentence chunks that you can SRS. It's a goldmine. Today I rewatched the first 15 minutes of the anime and all the little bits I couldn't figure out before are clear as day. Awesome thanks. Unfortunately I work on a mac, so I'll have to track down a sympathetic friend to create other decks with this program and share them.  Many many thanks.

Last edited by timcampbell (2009 February 01, 2:07 am)

zazen666
Member
From: japan
Registered: 2007-08-09
Posts: 646

GD! Thats cool!

markal
Member
From: Tokyo
Registered: 2007-10-22
Posts: 77

This is just amazing. You are very talented to create such an application and very generous to share it with others.

HerrPetersen
Member
From: Germany
Registered: 2007-01-02
Posts: 227

First great respect and thanks for this programm! I have a problem/question concerning snapshots:
I donwloaded avisynth 2.5. Chose the option to create snapshots.
subs2srs created audio-files and a text file, however the snapshots were not created (the media-directory did not include them).
I made a snapshot of the snapshotripping process and it reads like this:

Source :
    *Filename:"d:\....avs"
    *Fourcc: None (RGB32)
    *Frames: 240
    *Resolution: 1480x342
    *Frame rate: 24.000 FPS
Compressor:
    * No Recompression
Destination:
    *Format: Null

I did not change the standard framerate and other settings in subs2srs.

Also while the sub-file I have seems pretty consinstent in timing, I found out, that the subs are a little bit behind the sound so 你好  only produces 好 sound-wise. So maybe an adjusting option would be good to have. (and the people who sub stuff probably do so at their own definition on what is "correct" timing)

Last edited by HerrPetersen (2009 February 01, 6:00 am)

undead_saif
Member
From: Jordan
Registered: 2009-01-28
Posts: 159

Thanks m8t, with the sound too? this will be a great help, I'm far from using it, but I already downloaded it and bookmarked the page for instructions, maybe there will be a better version by the time I need it, but its better this way, thanks a bunch!

Tobberoth
Member
From: Sweden
Registered: 2008-08-25
Posts: 3362

Won't the import files be quite enormous? I mean, just look at a random subtitle file, it's generally several hundred lines of dialogue. Getting through just one movie would probably take you quite a while... and most of the dialogue will probably be stuff you allready understand.

While the program is really cool, I'm wondering if it's such a good idea to use it. To bring up Khazu, he usually says that you need to learn 10 000 sentences as fast as possible, make them count by picking out the ones you really need. Taking every line in a whole movie isn't really picking the important ones.

HerrPetersen
Member
From: Germany
Registered: 2007-01-02
Posts: 227

Agree with Tobberoth - but still if there was a "input editor" of some kind you could skim through the lines and only check/uncheck those that you think are important/(not important) for you.
The possibilities are still huge!

nest0r
Member
Registered: 2007-10-19
Posts: 3000

Tobberoth wrote:

Won't the import files be quite enormous? I mean, just look at a random subtitle file, it's generally several hundred lines of dialogue. Getting through just one movie would probably take you quite a while... and most of the dialogue will probably be stuff you allready understand.

While the program is really cool, I'm wondering if it's such a good idea to use it. To bring up Khazu, he usually says that you need to learn 10 000 sentences as fast as possible, make them count by picking out the ones you really need. Taking every line in a whole movie isn't really picking the important ones.

This software makes it easier to develop decks based both on what individuals want and what they need, based on analyses and manipulation of the data once it's collected. Gives us more control, allowing for diverse, distributable, user-specific corpora. There's all kinds of possibilities with this, I'm sure people that are more database/list savvy than I can offer and develop more concrete examples.

And I mean, in addition to the interface/filtering stuff in regards to culling excess words/sentences from these specific decks. (I guess you could do some kind of check against import files you already have, eliminating redundant lines?) Just imagine, for example, analyzing the decks created by different users, cross-referencing them based on different taxonomies/genres/themes (to create, for example, frequency lists), and creating condensed decks from those that a person can select depending on whatever a person wants or needs to study.

Is there someplace to use as a 'headquarters' now that ajatt.pseudosphere is gone? Bad timing with that.

Last edited by nest0r (2009 February 01, 7:15 am)

zodiac
Member
Registered: 2008-04-01
Posts: 120

nest0r wrote:

Is there someplace to use as a 'headquarters' now that ajatt.pseudosphere is gone? Bad timing with that.

I wonder about the copyright issues with distributing/sharing the decks. ajatt.pseudosphere gave access to sentences from books only to those with proof of ownership but I remember they freely posted links to subs - what would be acceptable for distributing the decks?

Nukemarine
Member
From: 神奈川
Registered: 2007-07-15
Posts: 1962

Just from looking at this and seeing website like reading tutor (http://language.tiu.ac.jp/index_e.html) and iKnow, it's easy to see what can happen:

User creates file from a movie (sadly, all sorts of copyright issues, but bear with me). User uploads files onto iKnow so it has image, audio, sentence, and translation (maybe in multiple languages).  iKnow parses inputted sentences so now any and all words used in the movie are connected to the files. An upgraded version of iKnow lets you automatically load all sentences from that file with new words not yet studied by you (I say new version as this is not possible yet).

Users that don't like using iKnow then use Anki to download that generated list so now they have only the sentences from the movie file that are words they have not learned.

Yeah, a good ways in the future. However, looking at upgrades to iKnow, Anki, Reading Tutor, and this recent program, it's easy to see there's much that can be done to streamline the self study process.

nac_est
Member
From: Italy
Registered: 2006-12-12
Posts: 617

Tobberoth,
what you say is true, but it's still much faster to remove the cards you don't need than manually adding the ones you need, isn't it?

KREVA
Member
From: USA
Registered: 2008-09-12
Posts: 176

So how would one go about ripping subtitles from a movie into a subtitle file that can be put to good use with this program?  If OCR is the only way, I think I'll pass (had bad experiences with using OCR back in the day).

Nukemarine
Member
From: 神奈川
Registered: 2007-07-15
Posts: 1962

Kreva, there is one option but you may not like it: Create a Hard sub video with Japanese Kanji (I use Xilisoft DVD to DivX to make mine). Then subs2srs with an English sub file for appropriate parsing.

Thing is though, it's going to suck as the English sub timing is not going to match up well with the Japanese. Like I said, it's an an option you may not like.

The only other option is just use sub-files that exist like from drama addict forums. In truth, a one hour show should net you 700 "cards", which should last you a bit. Granted, ever new show you add should have less and less useful cards as in new vocabulary or phrases.

radical_tyro
Member
Registered: 2005-11-19
Posts: 225

tobberoth, i set up a shortcut in anki to suspend (or delete) a fact, so as i'm learning new cards i just hit that key every time i completely understand a line. takes like 2 seconds each time and i basically get to watch the movie as i'm doing it (they're presented in order).

timcampbell
Member
From: 北京
Registered: 2007-11-04
Posts: 187

Tobberoth wrote:

Won't the import files be quite enormous? I mean, just look at a random subtitle file, it's generally several hundred lines of dialogue. Getting through just one movie would probably take you quite a while... and most of the dialogue will probably be stuff you allready understand.

While the program is really cool, I'm wondering if it's such a good idea to use it. To bring up Khazu, he usually says that you need to learn 10 000 sentences as fast as possible, make them count by picking out the ones you really need. Taking every line in a whole movie isn't really picking the important ones.

I just suspend the cards I don't need. It's much faster this way than cutting and pasting a thousand sound files into Anki, which is a great way to learn yet very time consuming - I've tried it.

Without hijacking the thread for more Khatz commentary (he has his own thread going hot and heavy right now) he also said, and correctly I believe, that you need to enjoy what you're learning. I love this movie, and I'm having a great time working through the sound files. For other people, that might not be so. OK, find a movie you love. Or don't do it at all. Doesn't matter to me.

Also, for shadowing purposes, movie and TV clips have the most natural Japanese, and using sound files in anki makes this really easy. I don't shadow the main character, since she's a teenage girl, but her guy friends are fair game. It's much better than stopping and rewinding a CD, and the voices are very natural in this movie.

nest0r
Member
Registered: 2007-10-19
Posts: 3000

Yes, suspending cards is easy enough, though I do like the idea of integrating a level checker with some kind of overarching user database. I just keep thinking about someone who's learning Japanese, and is in a mode where they want to learn Japanese 'in general', but want, say, real-world 'business Japanese' lessons, or to know enough Japanese to watch plenty of current anime of any specific type (scifi, slice of life, shounen, whatever), and can just check out a file made from the most frequent words of 50 different sources of that type, eliminating redundancies with a level checker, et cetera.

Also, I've actually got a large number of redundant cards from iKnow, but I keep them for the speaking practice--I either suspend a card or just grade it on how well I can accurately reproduce the basic pitch and flow. It really helps with my 'ear training' and development of subvocal/articulatory rehearsal (and speaking ;p).

I guess someone needs to hijack a Somalian pirate ship and set up wifi there to resolve our copyright dilemma. Our motto: "Avast ye landlubbers, no int'l copyright laws be keelhaulin' arrr language-learning, yarrr! Yo ho ho, どうもありがとう!"

Last edited by nest0r (2009 February 01, 4:00 pm)