Back

Epwing2Anki - Tool For Automatically Generating Anki Vocabulary Cards

#26
Tori-kun Wrote:@cb4960: Thanks for your reply! I tried as you said, however...

My japanese.txt file has the following format:
Expression | Reading | Meaning (in German) | Audio

Now, how can I preserve exactly this data and just "add" the sentence/definition from a J-J and a J-E dictionary to this already existing file or another new .tsv file, that still has the information (i.e. German meaning and [sound:$a.mp3] audio path) saved from the original japanese.txt file?
Right now, I have to manually add the definition and sentence by manually copy and pasting from 研究者/明鏡 dictionary, which is a bit troublesome.

Perhaps you could add the field "Audio" and "Saved Meaning" to Epwing2Anki?
Okay, I get it now. What I can do is add an option to append the line from the original word list to the Epwing2Anki import file. Should be simple enough.
Reply
#27
cb4960, is there any chance to see this for MacOS?
Reply
#28
I had an idea this morning... in order to avoid getting sentences full of vocab I don't know, what would be cool would be if there was a way to check the contents of the output file vs. a dump of my deck to see if there are any kanji not in the word list and not in the deck that show up in the sentences. (If that makes sense.)

So if I run E2A with a word list, it would grab a bunch of sentences, then reference that against the .tsv/.csv Anki dump file of my deck, and knock out anything that doesn't show up in either the word list or the deck. That would save a tremendous amount of time.

Or can I just do that somehow with the sorter? I forget.
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#29
LazyNomad Wrote:cb4960, is there any chance to see this for MacOS?
No.
Reply
#30
rich_f Wrote:I had an idea this morning... in order to avoid getting sentences full of vocab I don't know, what would be cool would be if there was a way to check the contents of the output file vs. a dump of my deck to see if there are any kanji not in the word list and not in the deck that show up in the sentences. (If that makes sense.)

So if I run E2A with a word list, it would grab a bunch of sentences, then reference that against the .tsv/.csv Anki dump file of my deck, and knock out anything that doesn't show up in either the word list or the deck. That would save a tremendous amount of time.

Or can I just do that somehow with the sorter? I forget.
Maybe I can add this *someday*. Not soon though.
Reply
#31
@cb4960: Thank you! Big Grin
Reply
#32
Hmm... I suppose a simple separate program that just did what I was talking about above (kind of like the sorter perl script) is all that I'd really need.

Time to brush up on the python skillz. >_>a
Reply
#33
Hello,

I have just released version 1.1 of Epwing2Anki.

Download Epwing2Anki v1.1 via SourceForge

What Changed?

● Added support for the 『大辞泉』 EPWING dictionary.

● Added support for the 『明鏡国語辞典』 EPWING dictionary.

● Added option to append lines from word list to end of import file lines. (Thanks Tori-kun!)

● Added option to create a separate card for each example sentence. (Thanks rich_f!)

● Added "Example Sentences (translation only)" field. (Thanks rich_f!)

● Fixed bug that prevented examples from being auto-chosen in priority order.

● Upgraded to eplkup 1.2.1 which has better gaiji support - especially for『大辞泉』and 『明鏡国語辞典』.

● Tatoeba searches are now much faster.

● User can now press the 1-9 keys on the Disambiguate Entries dialog to select the
corresponding entry.

● Punctuation (full stops, capitalization, periods) is now added to example sentences
that needed it.

● Fixed bug where link tag was not removed in EPWING parsers.

● 広辞苑第六版: Remove the source text from example sentences.

● 広辞苑第六版: Fixed case where EB Library returns unrelated entries when
the looked up entry contains an initial '○' character.

● 広辞苑第六版: Extract example sentences in case where they are followed with '↔' or '→'.

● Fixed potential crash bug in EPWING parsers.

● Other minor parser tweaks.

cb4960
Edited: 2012-07-21, 3:51 pm
Reply
#34
Woot! Can't wait to give it a whirl tonight.
Reply
#35
Hello,

I have just released version 1.2 of Epwing2Anki.

Download Epwing2Anki v1.2 via SourceForge

What Changed?

● Added fine-tune dialog that contains the following options:

■ Compact definitions (place the entire definition on a single line) - This option was
moved from the Setup Inputs and Outputs page.

■ Append short name of source dictionary to example sentences.

■ Text to place in front of examples. Default is '▲'.

■ For EDICT: Remove word type indicators [example: (v1,n)].

■ For EDICT: Remove "popular" indicator [example: (P)].

■ For 研究社 新和英大辞典 第5版: Remove entries with definitions that don't
contain alpha characters (a-z or A-Z).

■ For J-J Dics: Keep examples in the definition.

■ For J-J Dics: Remove the '‐' and '・' characters from readings.

● Fixed case that allowed blank cards to be generated.

● 研究社 新和英大辞典 第5版: Remove '◧' and '◨' from expression.

cb4960
Edited: 2012-07-22, 5:39 pm
Reply
#36
First Rikaisama, than this! cb4960, you're unbelievable!
This program is just amazing... I've just done an 8000 words deck with words from official jouyou kanji list. Well, I still need to make some adjustements, but it should be fine pretty soon^^

I was using kenkyuusha 5th, but it's probably too much information (while edict is too little), so I was thinking if you can add support for Kenkyusha shin eiwa-waei chujiten in next release. I have this epwing and I've found it explains things in a really clear way and in fewer words.
Alternatively, I'd ask if it's possible to grab definition online from sanseido like rikaisama does.

If you think you can add one of these, I'll wait so I can do a better deck with more choices for dictionaries (unfortunately I can't find other epwing).

Yoroshiku onegai itashimasu m(_ _)m
Reply
#37
(1.2) Fine tune generates an exception, and I can't generate the sample word list either. The log is empty, it just doesn't do anything. It creates a file 1KB, UTF-8, but that's it.

I'll try 1.1 to see if it does the same thing.

Ah, figured out the problem with 1.2. Don't copy over the previous install! (Doh!) A fresh copy in a fresh directory fixed everything.

Found this in one of the sentences from the Daijirin for 毒性:

Quote:あいた,<img src="data:image/bmp;base64,Qk3eAAAAAAAAAD4AAAAoAAAAEAAAABAAAAABAAEAAAAAAKAAAABtCwAAbQsAAAIAAAACAAAA////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAAAADAAAADAAAADAAAAHAAAAOAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAA" alt="?"/><img src="data:image/bmp;base64,Qk3eAAAAAAAAAD4AAAAoAAAAEAAAABAAAAABAAEAAAAAAKAAAABtCwAAbQsAAAIAAAACAAAA////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMAAAAAwAAAADAAAAAMAAAAA4AAAABwAAAADAAAAAAAAAAAAAAAAAAAAAAAA" alt="?"/>。あ,___なお方なあ/滑稽本・浮世風呂 2。
And in the Kenkyuusha 5th for 菌:
Quote:シイタケの菌を榾木(<sub>ほ</sub><sub>た</sub><sub>ぎ</sub>)に打ち込む。
Probably need something to filter out HTML from sentences... but the stuff from the 菌 entry... is that for Ruby? I thought I turned that off. >_>a

That said, I dumped my 332 words list from KO Book 2, and it created about 1500 sentences, and couldn't find entries for 40 of them. (Not sure what's up with that.)

EDIT: One thing that would be another neat option: if no sentences are found, just create placeholder entries for those words in the output file. It makes it easier to add sentences to the spreadsheet later if you're following some kind of order like KO or KiC.

Still, it saves a ton of time. This and dictscrape in some combo could seriously blast through this stuff.
Edited: 2012-07-22, 6:49 pm
Reply
#38
kazeatari Wrote:I was using kenkyuusha 5th, but it's probably too much information (while edict is too little), so I was thinking if you can add support for Kenkyusha shin eiwa-waei chujiten in next release. I have this epwing and I've found it explains things in a really clear way and in fewer words.
Alternatively, I'd ask if it's possible to grab definition online from sanseido like rikaisama does.

If you think you can add one of these, I'll wait so I can do a better deck with more choices for dictionaries (unfortunately I can't find other epwing).
I can probably add support for Kenkyusha shin eiwa-waei chujiten next weekend. It's harder to parse than the other dictionaries though because a lot of the example sentences are behind links so I'll have to add support for this in my eplkup tool.

It's probably going to be a long while before I add support for any web-based dictionaries.
Reply
#39
rich_f Wrote:(1.2) Fine tune generates an exception, and I can't generate the sample word list either. The log is empty, it just doesn't do anything. It creates a file 1KB, UTF-8, but that's it.

I'll try 1.1 to see if it does the same thing.

Ah, figured out the problem with 1.2. Don't copy over the previous install! (Doh!) A fresh copy in a fresh directory fixed everything.
The reason that you got this exception is because the settings.e2a file is incompatible from one version to the next.

rich_f Wrote:Found this in one of the sentences from the Daijirin for 毒性:
Quote:あいた,<img src="data:image/bmp;base64,Qk3eAAAAAAAAAD4AAAAoAAAAEAAAABAAAAABAAEAAAAAAKAAAABtCwAAbQsAAAIAAAACAAAA////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAAAADAAAADAAAADAAAAHAAAAOAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAA" alt="?"/><img src="data:image/bmp;base64,Qk3eAAAAAAAAAD4AAAAoAAAAEAAAABAAAAABAAEAAAAAAKAAAABtCwAAbQsAAAIAAAACAAAA////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMAAAAAwAAAADAAAAAMAAAAA4AAAABwAAAADAAAAAAAAAAAAAAAAAAAAAAAA" alt="?"/>。あ,___なお方なあ/滑稽本・浮世風呂 2。
And in the Kenkyuusha 5th for 菌:
Quote:シイタケの菌を榾木(<sub>ほ</sub><sub>た</sub><sub>ぎ</sub>)に打ち込む。
This is normal. Anki will display it correctly. The image tags are for gaiji characters that I haven't added to my gaiji-to-UTF8 lookup table or cannot add because no equivalent exists. The superscript and subscript tags are a normal part of the entry. If you look the same entries up with EBWin, you will see the same thing.

rich_f Wrote:Probably need something to filter out HTML from sentences...
Just in case I wasn't clear above, the HTML tags are there by design. If you really want to, you can remove them from the import file with a regex. The only tags added are img, br, sup, and sub. I wouldn't recommend removing them unless you want your cards to look messed up.

rich_f Wrote:but the stuff from the 菌 entry... is that for Ruby? I thought I turned that off. >_>a
Kenkyuusha 5th sometimes uses subscript to indicate reading. The output of the "Expression (with ruby)" field will look something like this: "毒性[どくせい]". This is how ruby is formatted in Anki with the Japanese Support plugin.
Edited: 2012-07-22, 7:16 pm
Reply
#40
omg! thank you so much cb4960! this is EXACTLY what I was looking for ^_^.
Reply
#41
Epwing2Anki says it supports Daijirin 2 version, but it seems the support isn't working.

Besides that, your work is great. Thanks a lot cb4960.
Reply
#42
Sebastian Wrote:Epwing2Anki says it supports Daijirin 2 version, but it seems the support isn't working.

Besides that, your work is great. Thanks a lot cb4960.
Hmm, maybe the title of your version of Daijirin 2 is slightly different from the one I have. In the log, look for the "Selected Dictionary:" line and post it here. This will tell me what the title is so that I can add it.

For example, this is my log when adding the unsupported Genius dictionary:

18:53:22.125: Epwing2Anki version: 1.2.0.0
18:53:22.125: Microsoft Windows NT 6.1.7601 Service Pack 1
18:53:22.126: FormMain_Load
18:53:25.477: Selected Dictionary: 『ジーニアス英和〈第3版〉・和英〈第2版〉辞典』
18:53:26.518: Sorry, this EPWING dictionary is not supported yet.
18:53:27.056: FormMain_FormClosing

The 4th line is the one I'm interested in.
Edited: 2012-07-22, 9:04 pm
Reply
#43
Thanks for the info, cb.

How difficult would it be to support EIJIRO in EPWING format? I have it in text format (you can buy the latest edition online for about 2000 yen), and I know there's a linux program out there that will convert it to EPWING... if there was a way to use it with this program in EPWING format, that would be godly. (The only major problem with Eijiro is the sheer quantity of sentences you wind up with for each entry, but better too many to choose from than too few IMO.)
Reply
#44
cb4960 Wrote:
Sebastian Wrote:Epwing2Anki says it supports Daijirin 2 version, but it seems the support isn't working.

Besides that, your work is great. Thanks a lot cb4960.
Hmm, maybe the title of your version of Daijirin 2 is slightly different from the one I have. In the log, look for the "Selected Dictionary:" line and post it here. This will tell me what the title is so that I can add it.

For example, this is my log when adding the unsupported Genius dictionary:

18:53:22.125: Epwing2Anki version: 1.2.0.0
18:53:22.125: Microsoft Windows NT 6.1.7601 Service Pack 1
18:53:22.126: FormMain_Load
18:53:25.477: Selected Dictionary: 『ジーニアス英和〈第3版〉・和英〈第2版〉辞典』
18:53:26.518: Sorry, this EPWING dictionary is not supported yet.
18:53:27.056: FormMain_FormClosing

The 4th line is the one I'm interested in.
Curiously enough, the log says:
Quote:Selected Dictionary: 『』
The problem was that Epwing2Anki doesn't recognize the dictionary if it has Japanese characters in its path. Changing the folder names to ascii characters solved the problem.

Thank you for your prompt reply.



Edit: Now that I could get it working, I'd like to propose a feature. Epwing2anki can create a different card for each example sentence, and each example sentence can correspond to different sub-definitions of the word. It would be great if you could have a field that contained only the sub-definition that corresponds to each example sentence.


For example, something like:

Front:
苦労は筆舌に___・し難い。

Back:
尽(く)す[つく・す]
③ すべて表現し切る。
Edited: 2012-07-22, 11:11 pm
Reply
#45
The capabilities and interface of the program are beyond my expectations. We're lucky to have you here, cb. However, one thing I think the "Getting Started" page can benefit from is more clarity on what type of files can be imported. I originally tried importing an HTML file and then a .doc file to no avail, but then everything imported perfectly once I tried a .txt file.

Anyway, thank you for this powerful program. I formerly used the highlight+R feature of Rikaisama (which you also graced us with) to make cards and then manually edited words that pulled up the wrong definition. Now, all I have to do is import a .txt file and Epwing2Anki will automatically let me choose the definition I want. My card-making process is now at least three times as efficient as it was before. Thank you!
Reply
#46
Quote:I can probably add support for Kenkyusha shin eiwa-waei chujiten next weekend. It's harder to parse than the other dictionaries though because a lot of the example sentences are behind links so I'll have to add support for this in my eplkup tool.

It's probably going to be a long while before I add support for any web-based dictionaries.
Thank you very much! One of those dictionary is enough for me ^__^ Zannen for sanseido, but I guess it's a completely different thing.
Moreover beggars shouldn't be chosers and I've already asked for juubun (XD)
Again... Thank you very much! ^______^
Reply
#47
Is there a chance to make Epwing2Anki format the .tsv output file conveniently like this? Smile

Expression*|Reading*|Meaning*|Audio*|J-E Sentence**|J-E Sentence TL|JJ

|= is a tab
*from japanese.txt Rikai-sama save file
**i.e. appended from Kenkyusha or any other dict EPWING file
TL=translation
Reply
#48
Actually, what I said above was wrong, I think. I can't find any easy way to convert Eijiro (or in my case, Waeijiro) to EPWING that I can get to work. -_-

I tried following the instructions on japaneselanguagetools.com, but I'm coming up with nothing that will work. Those instructions seem to leave out 1 or 2 critical steps. I can't get PDIC to convert the file, and I can't get EBstudio to make the right file. I can't even get PDIC to look at the files that come with Eijiro, for that matter, and they're supposed to work with PDIC. When I look at the text file I got, from the Eijiro people, it looks fine. Sadly, Google isn't helping.

Also, is it possible to add a flag for "Full" searches, a la EBWin? I think that may be contributing to some of my "misses," -- for example, 恥をかく doesn't show up in Kenkyuusha 5th edition when I run the program normally. BUT it's in there, just not as a headword. It's found under 恥, with its own definition. Maybe in case a definition or sentences don't show up using the regular mode, try with Full mode? (Because using Full mode on everything is a)overkill and b)slows everything down considerably.)
Reply
#49
Oh, I think I figured out at least *part* of my problem with PDIC. The ini file that comes with Eijiro has to be installed in the main PDIC program directory with the other Eijiro PDIC files-- so just copy everything in the PDIC-UNI directory and paste it in the program's directory.

Now to try to figure out the rest of it. Big Grin
Reply
#50
Sebastian Wrote:Edit: Now that I could get it working, I'd like to propose a feature. Epwing2anki can create a different card for each example sentence, and each example sentence can correspond to different sub-definitions of the word. It would be great if you could have a field that contained only the sub-definition that corresponds to each example sentence.


For example, something like:

Front:
苦労は筆舌に___・し難い。

Back:
尽(く)す[つく・す]
③ すべて表現し切る。
Sounds like a good idea. I'll try to add it as a fine-tune option in some future release.
Reply