kanji koohii FORUM
dictscrape: library/anki plugin for semi-automatic card creation - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: dictscrape: library/anki plugin for semi-automatic card creation (/thread-9652.html)

Pages: 1 2 3


dictscrape: library/anki plugin for semi-automatic card creation - rich_f - 2012-07-10

That reminds me... I have a Mac Mini I never use. Maybe lxml will have a better time on that? If I have time today, I'll give it a shot.

I'll also try uninstalling Python 2.7.3 on Windows and reinstalling everything. Something could've gotten corrupted along the way.


dictscrape: library/anki plugin for semi-automatic card creation - partner55083777 - 2012-07-10

If you give it a shot let me know how it turns out. I guess I've just been spoiled by Linux. Everything is so easy with a package manager. If you can't get it working I'll have to find some way to play around with it on Windows.


dictscrape: library/anki plugin for semi-automatic card creation - theadamie - 2012-07-10

can this work with other dictionaries? namely naver's kor-eng dictionary with the hanja "kanji" also put in?


dictscrape: library/anki plugin for semi-automatic card creation - rich_f - 2012-07-10

Well, Windows has its own Python package manager, easy_install. But it's not as robust about figuring out what dependencies each package has, and it won't go and get them for you. And then there are some packages that you can't install with easy_install. So it's really a mess out there for people who don't have a high level of sophistication with this sort of thing.


dictscrape: library/anki plugin for semi-automatic card creation - rich_f - 2012-07-10

You know, lxml is a PITA to install on Mac OS X as well. -_-

I can't get it to install, and I don't have an afternoon to figure it out. Just let me know when it runs on something not powered by Linux.


dictscrape: library/anki plugin for semi-automatic card creation - rich_f - 2012-07-12

Okay, I had a couple of hours free, so I dumped Fedora 16 on a box I'm not using, spent another couple of hours getting everything to run, and got everything installed properly to the point where the plugin works!

It looks nice in Fedora. The problem I'm having with my Fedora install is that it's slow as mud, and I'm running off of an SSD. (Just updated the firmware, too. No help.) Ubuntu was even worse, so I don't know what's up with that box.

But the plugin works. I used it a bit, and it's nice. Here's my feedback on the interface:

Generally, everything needs to be more streamlined. In the final version, you'll want something that works similar to YomiChan, where you can assign which fields will go where automatically in Preferences. Setting up a new model/deck just for this is a little tricky. I'd like to be able to dump sentences in my current deck as it is, and leave it to me figure out in which fields I want to stick the example sentences, translations, and definitions. I may want some of the stuff in the "word" deck in every sentence card, all crammed in one field, and I may want to forgo the whole idea of a "word" deck. (But I know it's easier to import that way, so I know it's important for alpha.)

I suppose we could just export the deck to .csv, then import the bits we want into our regular decks, but that adds extra work.

It might be easier to have a way to read in a text file list of words and kana and generate the appropriate fields. (So again, something akin to something else that YomiChan does-- it reads a txt file in Anki and lets you work on it, just in a slightly different manner.) That's kind of what it's doing now, it's just a matter of getting the text/kana from a txt file the user selects as opposed to a deck.

I think being able to use the fields you scrape as you like is important. So if I want 10 cards for 勉強, and I just want to study sample sentences, then I would want to have the scraper plunk the word, the word in kana, and the definition in the same place in all of my cards. In my main deck, I just have a "meaning" field where I dump the sentence translation (if needed), and any definitions I want. Again, this is something that should be configurable in preferences.

I thought of something else-- not everyone wants English translations. If there's a way to turn off English translation selection in the preferences for people who just want the sentences (for the folks who are big on monolingual), then that would be an attractive feature. I'm not picky about that, but I know some people are. They could then run the sentence cards through the sentence glosser plugin to get definitions in JP if they want.

Also, a button to start the plugin from the Anki main screen would be great. Or something along with Ctrl-J.

Ah, I know what would also be great-- an input box in the plugin window, so you can create cards on the fly. Say, for example, I'm reading a book in Yomichan, and I come across a word I *really* want to learn. I want to be able to copy the word, open DictScrape, pop it in the input window, and generate 10 sentence cards of it. (Then maybe flag some of them to run through the Sentence Glosser to pick up all of the definitions of some of the trickier sentences.)

Anyway, these are some random thoughts for now, so you get feedback ASAP. Even as an alpha, it's great. I think the most important thing is to get it so it will run on Win/Mac so you can get more testers.

EDIT: Also, periods. I like JP-style periods in my sentences. Dictscrape seems to cut them off for some reason. (Maybe something else for Preferences?)

EDIT2: Figured out why my box was running so slow-- it's Celeron box I've been meaning to turn into a plain vanilla fileserver.

EDIT 3: Got it running on a faster box in Ubuntu 12.04LTS. The instructions on this page: http://tightwadtechnica.com/?page_id=4163
were *really* useful in getting lxml et. al. running. (Use apt-get instead of aptitude, but it works with aptitude as well, if you install aptitude.)

I'll do some more testing over the next few days, and see what I can break.


dictscrape: library/anki plugin for semi-automatic card creation - rich_f - 2012-07-12

Ok, I've got a sort of workflow down, and it seems to be working okay. Running it on a faster box makes a big difference.

One other thing that I think would be great: an alc.co.jp lookup, because I'm running into words that don't show up in Yahoo. (And I'm pretty sure they'll show up there. ALC has all kinds of stuff on it. Only problem with ALC is that you get *pages* of output.)

EDIT: Here's what I'm talking about-- searching for 外資系 doesn't yield good results, but you can find it in the definitions of 外資 in one of the dictionaries.


dictscrape: library/anki plugin for semi-automatic card creation - partner55083777 - 2012-07-13

Thanks for going through a lot of work to check this out. You've encouraged me to finally play around with this on Windows. I got it working but it's kind of a pain. Here are the steps to get it working on Windows.

1. Install Anki.

2. Install Python 2.7.3 from http://www.python.org/download/. Make sure you install Python 2.7.

3. Install lxml from http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml. I installed "lxml-2.3.4.win32-py2.7.‌exe". Make sure you install a version for Python 2.7.

4. Copy the files needed for the plugin to anki's plugin directory ("C:\Documents and Settings\USERNAME\Application Data\.anki\plugins\"). It's really annoying you can't use links (shortcuts) like you can on Linux. You need these files to exist:

C:\Documents and Settings\USERNAME\Application Data\.anki\plugins\scraper.py
C:\Documents and Settings\USERNAME\Application Data\.anki\plugins\scraper_gui\
C:\Documents and Settings\USERNAME\Application Data\.anki\plugins\scraper_gui\dictscrape

5. This is where it gets annoying. You need lxml to be somewhere Anki can see it so it knows to import it. Copy lxml to your plugins directory. So basically, just copy the directory C:\Program Files\Python27\Lib\site-packages\lxml\ to the anki plugins directory. As a result, you should have this file:

C:\Documents and Settings\USERNAME\Application Data\.anki\plugins\lxml\

6. This should be all you need for the plugin to work... Unfortunately, it's not. It still doesn't work. It appears that lxml was packaged incorrectly by the ~gohlke site, so when Anki goes to actually load lxml, it can't find _elementpath.py, despite it obviously being in the lxml directory. We need to copy _elementpath.py to our anki plugins directory as well. So basically just copy C:\Program Files\Python27\Lib\site-packages\lxml\_elementpath.py to your anki plugins directory. You should have this file:

C:\Documents and Settings\USERNAME\Application Data\.anki\plugins\_elementpath.py.


That's it. You should now be able to open Anki and not be presented with any errors.

It took me a couple hours to figure this out... >_> This process really made me glad I don't have to use Windows on a daily basis.


dictscrape: library/anki plugin for semi-automatic card creation - rich_f - 2012-07-13

お疲れ様です。

I'll give the Windows method a whack this weekend.

Right now, I'm using my ubuntu box to bang through a list of about 150 words I found in the digital version of KO2001 volume 1 that I never bothered to enter into my deck. They're a mix of common and uncommon words. The plugin makes this a *much* faster task, as long as the words I scrape are 2 kanji or fewer. For some reason, the Yahoo JP-EN dictionaries don't like 3 kanji or longer words. Sometimes they show up, but usually they don't. The 国語 dictionaries don't do sentences well, although the longer words will usually (not always) show up.

EDIT: Doh. I thought that you couldn't enter a custom definition. You can. I'm a little brain dead today. Just got the hang of it. It's pretty good.

Need to be able to copy the notes field when <2 sentences found, so external website sentences can be entered with good notes field. If there was a text box with the "Notes" field that dictscrape creates that I could copy/paste into sentences I find on different web sites like ALC, that would *really* be useful. Sometimes I only find one useful sentence, so the "Notes" field that you get for sentence cards doesn't generate. That's really handy to have pre-generated. Just copy to clipboard and go.

Edit: bolded stuff for clarity.
Edit2: removed brain-dead entry.


dictscrape: library/anki plugin for semi-automatic card creation - partner55083777 - 2012-07-14

Let me respond to some of your previous posts since I didn't have time last night.

rich_f Wrote:Generally, everything needs to be more streamlined. In the final version, you'll want something that works similar to YomiChan, where you can assign which fields will go where automatically in Preferences. Setting up a new model/deck just for this is a little tricky. I'd like to be able to dump sentences in my current deck as it is, and leave it to me figure out in which fields I want to stick the example sentences, translations, and definitions. I may want some of the stuff in the "word" deck in every sentence card, all crammed in one field, and I may want to forgo the whole idea of a "word" deck. (But I know it's easier to import that way, so I know it's important for alpha.)
Yes, this is basically the idea, although I think you're conflating the terms a little bit. Right now the plugin uses two different models, a "Words" model and a "Sentences" model. The reason it is setup like this is because this is exactly how my own deck was setup.

After Anki2 is released, I plan on porting the plugin to Anki2 and adding the things you are suggesting. There will be a preferences screen where you can pick what models you want to use and what information you want stored in what field in your model. I plan on adding a way to make vocab cards (like the "Words model" being used now) and sentence cards (like the "Sentences" model). I plan on making it so that you don't have to make sentence cards if you don't want to, and you don't have to make vocab cards if you don't want to. I think this is basically what you're talking about?

Quote:It might be easier to have a way to read in a text file list of words and kana and generate the appropriate fields. (So again, something akin to something else that YomiChan does-- it reads a txt file in Anki and lets you work on it, just in a slightly different manner.) That's kind of what it's doing now, it's just a matter of getting the text/kana from a txt file the user selects as opposed to a deck.
I'm not sure I'll add this immediately. It's not exactly a use case that I personally have to deal with. I would probably accept patches for this kind of functionality, but it's not pressing for me personally. I'll have to think about this a little more.

(In the meantime you could just take your text file of words and import it in to anki as the "Words" model. Then go into the card browser, sort all cards by creation date, and scroll to the most recently created cards. The cards you just imported should be there. You can then use the plugin like normal. It's slightly more work this way, though Undecided)

Quote:I think being able to use the fields you scrape as you like is important. So if I want 10 cards for 勉強, and I just want to study sample sentences, then I would want to have the scraper plunk the word, the word in kana, and the definition in the same place in all of my cards. In my main deck, I just have a "meaning" field where I dump the sentence translation (if needed), and any definitions I want. Again, this is something that should be configurable in preferences.
I'm not sure what you mean by this. Could you give me a concrete example of what you want your cards to look like, what fields you want to have, and a concrete example of what data you want in each field?

Quote:I thought of something else-- not everyone wants English translations. If there's a way to turn off English translation selection in the preferences for people who just want the sentences (for the folks who are big on monolingual), then that would be an attractive feature. I'm not picky about that, but I know some people are. They could then run the sentence cards through the sentence glosser plugin to get definitions in JP if they want.
I'm also not sure 100% what you mean when you're talking about the sentence glosser plugin. Doesn't the sentence glosser plugin query wwwjdic to get english definitions?

For now, if you don't want to see the english definition, you could just not include it in your card model (although I'm sure you're aware of this).

Quote:Also, a button to start the plugin from the Anki main screen would be great. Or something along with Ctrl-J.

Ah, I know what would also be great-- an input box in the plugin window, so you can create cards on the fly. Say, for example, I'm reading a book in Yomichan, and I come across a word I *really* want to learn. I want to be able to copy the word, open DictScrape, pop it in the input window, and generate 10 sentence cards of it. (Then maybe flag some of them to run through the Sentence Glosser to pick up all of the definitions of some of the trickier sentences.)
Yeah, I plan on adding this. Probably after porting to Anki2.

You also gave me a good idea. At the bottom of each of the screens where you can edit the sentence that will be created, there should be a checkbox that allows you to add some tag to each card. Then, when you're creating your cards, you can just check which cards you want to tag. After creating all the cards, you can go to the card browser, search for all the cards that have been tagged, and run them through the sentence glosser. I imagine that would make your workflow a little easier.

Quote:Also, periods. I like JP-style periods in my sentences. Dictscrape seems to cut them off for some reason. (Maybe something else for Preferences?)
Actually, this really ticks me off too! I like japanese-style periods at the end of my sentences, but the Yahoo dictionaries cut them off. For instance, if you check out 中枢(ちゅうすう) in the Progressive dictionary, the example sentences don't have periods at the end! I had originally thought about just adding the periods in myself, but I decided against it. Some of the example "sentences" really aren't full sentences, so shouldn't have periods. For instance, if you look up 中枢 in Progressive, you will find the example sentence "社会の中枢", which is really just a phrase. You can see that the English sentence also doesn't have a period....

Maybe I just thought of a way to solve this. If the english sentence has a period, then the Japanese sentence should also get a period after it... I'll play around with this tonight and see if I can get it working...


dictscrape: library/anki plugin for semi-automatic card creation - partner55083777 - 2012-07-14

rich_f Wrote:One other thing that I think would be great: an alc.co.jp lookup, because I'm running into words that don't show up in Yahoo. (And I'm pretty sure they'll show up there. ALC has all kinds of stuff on it. Only problem with ALC is that you get *pages* of output.)

EDIT: Here's what I'm talking about-- searching for 外資系 doesn't yield good results, but you can find it in the definitions of 外資 in one of the dictionaries.
After I get the plugin ported to Anki2 and add in Preferences, I'm going to start adding in other dictionaries. Right now I plan on adding in support for the definitions from http://www.sanseido.net/ and the example sentences from http://weblio.jp/. I'll also add in support for entries from edict, and maybe example sentences from the wwwjdic site (is this just sentences from tatoeba or something? I dunno).

The example sentences from alc.co.jp look similar to those from weblio.jp. Do you know what the differences are?

Also, after taking a look at ALC, it looks like it will be difficult to parse the definitions/example sentences. For instance, if you lookup just "外資", it looks like the first entry is a list of definitions, and then there are around 10 or so example sentences, and then you get to another word with a couple definitions ("外資導入"). If I added this to the plugin, I'm wondering how to show all of these sentences/definitions? How should I format them? The example sentences from weblio.jp will be easy. I'll just have one definition entry will all of the example sentences. However ALC looks more complicated.


dictscrape: library/anki plugin for semi-automatic card creation - partner55083777 - 2012-07-14

rich_f Wrote:I'll give the Windows method a whack this weekend.
Okay, thanks. Let me know how it goes. I did it on Windows XP, so it might be a little different on Windows 7.

Quote:Right now, I'm using my ubuntu box to bang through a list of about 150 words I found in the digital version of KO2001 volume 1 that I never bothered to enter into my deck. They're a mix of common and uncommon words. The plugin makes this a *much* faster task, as long as the words I scrape are 2 kanji or fewer. For some reason, the Yahoo JP-EN dictionaries don't like 3 kanji or longer words. Sometimes they show up, but usually they don't. The 国語 dictionaries don't do sentences well, although the longer words will usually (not always) show up.
That's unfortunate. I guess I don't add those kind of words that much, so I never realized that the yahoo dictionaries were short of those types of words.

Quote:Need to be able to copy the notes field when <2 sentences found, so external website sentences can be entered with good notes field. If there was a text box with the "Notes" field that dictscrape creates that I could copy/paste into sentences I find on different web sites like ALC, that would *really* be useful. Sometimes I only find one useful sentence, so the "Notes" field that you get for sentence cards doesn't generate. That's really handy to have pre-generated. Just copy to clipboard and go.
I can see why you'd want this. Let me see what I can do about it.


dictscrape: library/anki plugin for semi-automatic card creation - partner55083777 - 2012-07-14

Okay, so I added jp-style periods to the entries in the Progressive dictionary. It's in commit 40627f55. If you pull from master you should get the update.


dictscrape: library/anki plugin for semi-automatic card creation - partner55083777 - 2012-07-14

rich_f Wrote:Need to be able to copy the notes field when <2 sentences found, so external website sentences can be entered with good notes field. If there was a text box with the "Notes" field that dictscrape creates that I could copy/paste into sentences I find on different web sites like ALC, that would *really* be useful. Sometimes I only find one useful sentence, so the "Notes" field that you get for sentence cards doesn't generate. That's really handy to have pre-generated. Just copy to clipboard and go.
This has been added. It's in commit 0b99f9b. Check it out and let me know if it's what you were imagining.


dictscrape: library/anki plugin for semi-automatic card creation - partner55083777 - 2012-07-14

theadamie Wrote:can this work with other dictionaries? namely naver's kor-eng dictionary with the hanja "kanji" also put in?
Hey, sorry I never got to this.

I've setup this plugin (and library) up so that it's possible to work with other dictionaries and other languages. However, other dictionaries have to have support for them programatically added in. That is to say, someone actually has to sit down and write the code for parsing the information from the dictionary. As I'm only studying Japanese, there is no support for other languages/dictionaries at the moment.

That being said, I would be willing to accept patches and support someone trying to add in support for other dictionaries. If you have any experience programming and wanted to give it a shot, let me know.


dictscrape: library/anki plugin for semi-automatic card creation - rich_f - 2012-07-14

Okay, I'll try to run the new files this evening. I love what the plugin does. It saves me a ton of time, and it makes me want to add vocab/sentence cards again, because those cards are generally a PITA to make.

After using the program for 3-4 hours more, I have a better idea of what works. I'll run the new files in a couple of hours and get back with more impressions.

Let me revise and clarify my initial impressions for now. A lot of what I was doing was thinking out loud, and a lot of it is more wish list that complaint, so don't stress too much. Anki 2.0 is coming, and the most important thing is getting it to run in 2.0.

Glad to hear that preferences are coming, and some of these new features sound *very* cool.
---------
Interface stuff: These are just cosmetic things that bugged me with the interface: I want an on-screen button I can hit easily to start the plugin, and the ability to start it with a F-button, or something easier to type than ctrl-j. (Ctrl-J feels weird.)

I also want the ability to get rid of confirmation windows in the preferences. If I don't find my word, I cancel, then get a confirmation pop-up. Some people may want that if they already have data in the cards, but if I'm just canceling out of a failed search, it gets annoying.
----------
I agree on not needing the txt file thing for now. After using it for a while, I realized that I can just export the resulting deck as a CSV, edit it and chop out the fields I don't want, and just import it into my deck. I was thinking in terms of "what I want down the road," because it would be nice to have in terms of workflow. (Easier than loading another deck, importing my list, doing my work, then exporting, importing to a spreadsheet program, editing, exporting, reopening in something to converting to UTF-8, then finally importing into my main Anki deck.)

What I was thinking was along the lines of providing the plugin a txt file as input, it opens up in a window in the plugin, and I select the word I want to scrape, instead of using a separate vocab model for input. The scraper comes up, and runs normally-- results go straight into the active deck, with the plugin's output fields mapped and formatted to whatever format you have. Kind of like YomiChan, but much more robust in the dictionary dept.

I've had the same deck since 2008. I keep it really simple: 4 fields. Expression, Meaning, Reading (auto-generated by Anki), and a 4th field I don't use. So when I run the plugin, using a txt file as input, I'd want to be able to map the output of the plugin to the fields in my deck, with html/text formatting.

So, %V, %VK, %VE, %S, %SE, %I, %N for the various values that dictscrape will generate, and I can map, say: %S to my expression field; something like: %SE{br}{br}%V (%VK) (%I) -- %VE all to my Meaning field; and Anki will handle generating the Reading field I use as it sucks in the cards that are created.

That's what I'm getting at, and that's what I was talking about with "using the fields as you want." That would significantly streamline stuff, but it would probably require a lot of work. This is a wish list thing.
-----------
Yeah, ALC would be more complicated. Thinking about it, some entries go on for many pages, and that might crash/slow down the plugin. (Maybe a cutoff in the prefs?) I run into a lot of words that I can only find examples in either my full-blown Kenkyuusha electronic dictionary, or on ALC's website. Obviously, ALC is easier to copy/paste from.
----------
Yeah, sentence glosser uses EDICT. Forgot about that. It would be nice if there was a way to use something else besides EDICT for glossing.
-----------
English definitions: What you say makes sense.
-----------
I'll get to work on the new version tonight!


dictscrape: library/anki plugin for semi-automatic card creation - rich_f - 2012-07-14

I'll keep editing this as I find words that scrape funny.


行動 こうどう is acting buggy in the new version. The Progressive results point to 行動半径, which is halfway down the results page of 行動. It's probably because the formatting is a bit weird for that page.

Also, 先方 せんぽう is acting a little off as well. There's lots of random JP in the New Century definition.

帰宅 きたく gives weird results. New Century's scrape shows: <:g11f1><:g11f1>帰宅する for the definition. Probably because of the 名 in a box character in the definition field. I can't even copy/paste it.

中央 ちゅうおう yields some odd results as well in the progressive, namely this bit:
After this sentence: 我々会社の末端の人間には中央の意向は分からない
We ordinary workers don't know what the higher-ups in the company are trying to do.

It skips all of the entries until you get this fragment: 行政)官庁. The fragment is missing 中央( so something is going on there.

中絶 ちゅうぜつ also comes up a little odd in Progressive's scrape.


dictscrape: library/anki plugin for semi-automatic card creation - theadamie - 2012-07-14

rich_f Wrote:Okay, I'll try to run the new files this evening. I love what the plugin does. It saves me a ton of time, and it makes me want to add vocab/sentence cards again, because those cards are generally a PITA to make.

After using the program for 3-4 hours more, I have a better idea of what works. I'll run the new files in a couple of hours and get back with more impressions.

Let me revise and clarify my initial impressions for now. A lot of what I was doing was thinking out loud, and a lot of it is more wish list that complaint, so don't stress too much. Anki 2.0 is coming, and the most important thing is getting it to run in 2.0.

Glad to hear that preferences are coming, and some of these new features sound *very* cool.
---------
Interface stuff: These are just cosmetic things that bugged me with the interface: I want an on-screen button I can hit easily to start the plugin, and the ability to start it with a F-button, or something easier to type than ctrl-j. (Ctrl-J feels weird.)

I also want the ability to get rid of confirmation windows in the preferences. If I don't find my word, I cancel, then get a confirmation pop-up. Some people may want that if they already have data in the cards, but if I'm just canceling out of a failed search, it gets annoying.
----------
I agree on not needing the txt file thing for now. After using it for a while, I realized that I can just export the resulting deck as a CSV, edit it and chop out the fields I don't want, and just import it into my deck. I was thinking in terms of "what I want down the road," because it would be nice to have in terms of workflow. (Easier than loading another deck, importing my list, doing my work, then exporting, importing to a spreadsheet program, editing, exporting, reopening in something to converting to UTF-8, then finally importing into my main Anki deck.)

What I was thinking was along the lines of providing the plugin a txt file as input, it opens up in a window in the plugin, and I select the word I want to scrape, instead of using a separate vocab model for input. The scraper comes up, and runs normally-- results go straight into the active deck, with the plugin's output fields mapped and formatted to whatever format you have. Kind of like YomiChan, but much more robust in the dictionary dept.

I've had the same deck since 2008. I keep it really simple: 4 fields. Expression, Meaning, Reading (auto-generated by Anki), and a 4th field I don't use. So when I run the plugin, using a txt file as input, I'd want to be able to map the output of the plugin to the fields in my deck, with html/text formatting.

So, %V, %VK, %VE, %S, %SE, %I, %N for the various values that dictscrape will generate, and I can map, say: %S to my expression field; something like: %SE{br}{br}%V (%VK) (%I) -- %VE all to my Meaning field; and Anki will handle generating the Reading field I use as it sucks in the cards that are created.

That's what I'm getting at, and that's what I was talking about with "using the fields as you want." That would significantly streamline stuff, but it would probably require a lot of work. This is a wish list thing.
-----------
Yeah, ALC would be more complicated. Thinking about it, some entries go on for many pages, and that might crash/slow down the plugin. (Maybe a cutoff in the prefs?) I run into a lot of words that I can only find examples in either my full-blown Kenkyuusha electronic dictionary, or on ALC's website. Obviously, ALC is easier to copy/paste from.
----------
Yeah, sentence glosser uses EDICT. Forgot about that. It would be nice if there was a way to use something else besides EDICT for glossing.
-----------
English definitions: What you say makes sense.
-----------
I'll get to work on the new version tonight!
no programming experience... even though that site is korean the eng-jap dictionary is really good. you should consider adding support for naver at some point.


dictscrape: library/anki plugin for semi-automatic card creation - partner55083777 - 2012-07-15

rich_f Wrote:I also want the ability to get rid of confirmation windows in the preferences. If I don't find my word, I cancel, then get a confirmation pop-up. Some people may want that if they already have data in the cards, but if I'm just canceling out of a failed search, it gets annoying.
Okay, I took out the confirmation window. It's commit 1c0ba82. It was kind of annoying to me too. It should be easy to add it back in when I get the preferences setup.


dictscrape: library/anki plugin for semi-automatic card creation - partner55083777 - 2012-07-15

rich_f Wrote:行動 こうどう is acting buggy in the new version. The Progressive results point to 行動半径, which is halfway down the results page of 行動. It's probably because the formatting is a bit weird for that page.
This has been fixed in 5dbdd5b. It would have been annoying to fix correctly, so I just had to take out the offending entry Undecided

rich_f Wrote:Also, 先方 せんぽう is acting a little off as well. There's lots of random JP in the New Century definition.
This has been fixed in 7951e95.


dictscrape: library/anki plugin for semi-automatic card creation - partner55083777 - 2012-07-15

rich_f Wrote:帰宅 きたく gives weird results. New Century's scrape shows: <:g11f1><:g11f1>帰宅する for the definition. Probably because of the 名 in a box character in the definition field. I can't even copy/paste it.
This has been fixed in b79696c.


dictscrape: library/anki plugin for semi-automatic card creation - partner55083777 - 2012-07-15

rich_f Wrote:中央 ちゅうおう yields some odd results as well in the progressive, namely this bit:
After this sentence: 我々会社の末端の人間には中央の意向は分からない
We ordinary workers don't know what the higher-ups in the company are trying to do.

It skips all of the entries until you get this fragment: 行政)官庁. The fragment is missing 中央( so something is going on there.
This has been fixed in 592e006.


dictscrape: library/anki plugin for semi-automatic card creation - partner55083777 - 2012-07-15

rich_f Wrote:中絶 ちゅうぜつ also comes up a little odd in Progressive's scrape.
This has been fixed in 26933a5.


dictscrape: library/anki plugin for semi-automatic card creation - rich_f - 2012-07-15

I double-checked the fixes, and they all look good here. I'll keep testing it, and let you know what breaks. I like the new features a lot. It's getting easier and easier.

One thought about using ALC-- set a limit on the number of entries returned, or set a limit of the length of sentences returned... but I guess it's still a pain, because the multi-page results would take a too long to scrape, etc. (I've had ALC give me thousands of results on dozens of pages. Not common, but scary.) One thing in its favor-- the layout is a little more regular than Yahoo, or so it seems.


dictscrape: library/anki plugin for semi-automatic card creation - partner55083777 - 2012-07-16

Let me respond to your post since I didn't yesterday.

rich_f Wrote:Interface stuff: These are just cosmetic things that bugged me with the interface: I want an on-screen button I can hit easily to start the plugin, and the ability to start it with a F-button, or something easier to type than ctrl-j. (Ctrl-J feels weird.)
Yes, I completely agree with you. I'm planning on figuring this out when Anki2 comes out.

rich_f Wrote:I agree on not needing the txt file thing for now. After using it for a while, I realized that I can just export the resulting deck as a CSV, edit it and chop out the fields I don't want, and just import it into my deck. I was thinking in terms of "what I want down the road," because it would be nice to have in terms of workflow. (Easier than loading another deck, importing my list, doing my work, then exporting, importing to a spreadsheet program, editing, exporting, reopening in something to converting to UTF-8, then finally importing into my main Anki deck.)
I can see how this would be helpful to you. It shouldn't be too difficult to add functionality like this, but it would take some time. If I forget about it, please remind me. I'd probably only add this after porting the plugin to Anki2.

rich_f Wrote:What I was thinking was along the lines of providing the plugin a txt file as input, it opens up in a window in the plugin, and I select the word I want to scrape, instead of using a separate vocab model for input. The scraper comes up, and runs normally-- results go straight into the active deck, with the plugin's output fields mapped and formatted to whatever format you have. Kind of like YomiChan, but much more robust in the dictionary dept.

I've had the same deck since 2008. I keep it really simple: 4 fields. Expression, Meaning, Reading (auto-generated by Anki), and a 4th field I don't use. So when I run the plugin, using a txt file as input, I'd want to be able to map the output of the plugin to the fields in my deck, with html/text formatting.

So, %V, %VK, %VE, %S, %SE, %I, %N for the various values that dictscrape will generate, and I can map, say: %S to my expression field; something like: %SE{br}{br}%V (%VK) (%I) -- %VE all to my Meaning field; and Anki will handle generating the Reading field I use as it sucks in the cards that are created.

That's what I'm getting at, and that's what I was talking about with "using the fields as you want." That would significantly streamline stuff, but it would probably require a lot of work. This is a wish list thing.
Yeah, this is pretty much exactly what I have planned. This is pretty much the biggest reason why I wanted to add preferences. I'll probably need you to help me test this out though. Just to make sure it works for you as well as me.

rich_f Wrote:Yeah, ALC would be more complicated. Thinking about it, some entries go on for many pages, and that might crash/slow down the plugin. (Maybe a cutoff in the prefs?) I run into a lot of words that I can only find examples in either my full-blown Kenkyuusha electronic dictionary, or on ALC's website. Obviously, ALC is easier to copy/paste from.
The problem with ALC is that I'm not sure exactly how to parse the entries. Like I was saying before, if you look up 外資, the first "entry" looks like a definition of 外資. The following entries look like example sentences, until you get to the 17th entry, which appears to be another definition (外資導入). How should I format all of this in my program? Should I just add the definition of 外資 as a definition, and then add the rest of the entries as example sentences? What should I do for something like 外資導入 where there are multiple definitions? Just add them all as english translations? Some like this:

Quote:...
外資労協
Japan Foreign Affiliated Trade Union

外資取り入れ
intake of foreign capital

外資導入
capital import/capital introduction/foreign capital inflow/introduction of foreign capital

外資流入
influx of foreign capital
...
Can you think of a better way to do it?

Also, have you tried the Kenkyuusha epwing dictionary? I'm using 研究社 新和英大辞典 第5版 and it's great. It has tons of example sentences. I eventually plan on adding support to the plugin for epwing dictionaries, and the first one I plan on adding is the kenkyuusha dictionary.