Dictscrape is a library and an Anki plugin. The library is basically a webscraper for Yahoo's online Japanese dictionaries. The Anki plugin uses the library for semi-automatic card creation. You can create new cards with definitions and example sentences from Yahoo's dictionaries in just a few clicks.
https://github.com/cdepillabout/dict-scrape
Here is a screenshot showing the Anki plugin. You can see the definition/example sentence selector in action.
![[Image: dictscrape.png]](https://github.com/cdepillabout/dict-scrape/raw/master/screenshots/dictscrape.png)
Many words still do not get parsed correctly, but if you want to try out the program, try looking up these words:
バリカン バリカン
うなる 唸る
きょうはく 強迫
なりすます 成り済ます
とかげ 蜥蜴
いく 行く
あかし 赤し
おもしろい 面白い
らくだ 駱駝
Current Status -- alpha01
Everything is still in the alpha stages. Currently, if you want to try it out and you're not a developer, you're out of luck. If you have experience with the command line, then checkout the README.md in the git repo.
Development Help
I would really appreciate any help other developers interested in this project.
There are a couple places that are specifically lacking.
1) The GUI. This was my first pyQt project, so some of the code is kind of funky. I would appreciate suggestions/pull requests fixing any of the stupid things I did. The GUI code is in scraper.py and scraper_gui/. Unfortunately, I haven't gotten around to adding docstrings to the GUI code, so if you have any questions while you're reading/hacking the code please feel free to ask.
Everything works for the most part, but I'm not sure the correct way to handle the card creation dialogs. After you pick the parts of the definition and example sentences you want to use, the selector window closes and the next window pops up. It lets you rearrange the order of the definitions and pick the example sentence to use on your main card. When you click 'Okay', that window closes and another window pops up... What's the best way to do this? Basically I want to make a "Wizard"-like system, where the same window is used and you just have to keep clicking "Next" to go to the next screen. Should I use the Wizard dialog that comes with Qt? Do I need to use stacked widgets or something? I haven't done any research in this area so I really have no idea.
2) The HTML/CSS. As you can see in the screen shot, the definition/sentence selector screen uses QWebView widgets to display the definitions from the dictionaries. The HTML/CSS for these widgets is located at the top of scraper_gui/ui/defwebviewui.py. I'm not much of a designer, so I just 適当に designed it, but I'm sure someone with talent could make it look a lot nicer. I would really appreciate any pull requests that fix the design (I guess mostly the CSS).
3) The big hurdle left to conquer is parsing the definitions from Yahoo's dictionaries. There is so much crap in there and it seems like nearly every entry is formatted differently. Parsing the definitions is currently being done in the parse_definitions() function in dictscrape/dictionaries/yahoo/*.py.
I would really appreciate any suggestions/pull requests with regards to parsing. Be sure to check out the section on testing in the README.md.
For non-developers, it would be awesome if you could let me know any words that don't get parsed correctly. Ideally it would be nice if you could create an Issue on github. If you don't have a github account just post here (althought, I can't promise I'll be able to fix the parsing for your word).
Keep in mind I try to take out a lot of the "useless" information from the definition, like the verb conjugation, 補説 parts, etc.
https://github.com/cdepillabout/dict-scrape
Here is a screenshot showing the Anki plugin. You can see the definition/example sentence selector in action.
![[Image: dictscrape.png]](https://github.com/cdepillabout/dict-scrape/raw/master/screenshots/dictscrape.png)
Many words still do not get parsed correctly, but if you want to try out the program, try looking up these words:
バリカン バリカン
うなる 唸る
きょうはく 強迫
なりすます 成り済ます
とかげ 蜥蜴
いく 行く
あかし 赤し
おもしろい 面白い
らくだ 駱駝
Current Status -- alpha01
Everything is still in the alpha stages. Currently, if you want to try it out and you're not a developer, you're out of luck. If you have experience with the command line, then checkout the README.md in the git repo.
Development Help
I would really appreciate any help other developers interested in this project.
There are a couple places that are specifically lacking.
1) The GUI. This was my first pyQt project, so some of the code is kind of funky. I would appreciate suggestions/pull requests fixing any of the stupid things I did. The GUI code is in scraper.py and scraper_gui/. Unfortunately, I haven't gotten around to adding docstrings to the GUI code, so if you have any questions while you're reading/hacking the code please feel free to ask.
Everything works for the most part, but I'm not sure the correct way to handle the card creation dialogs. After you pick the parts of the definition and example sentences you want to use, the selector window closes and the next window pops up. It lets you rearrange the order of the definitions and pick the example sentence to use on your main card. When you click 'Okay', that window closes and another window pops up... What's the best way to do this? Basically I want to make a "Wizard"-like system, where the same window is used and you just have to keep clicking "Next" to go to the next screen. Should I use the Wizard dialog that comes with Qt? Do I need to use stacked widgets or something? I haven't done any research in this area so I really have no idea.
2) The HTML/CSS. As you can see in the screen shot, the definition/sentence selector screen uses QWebView widgets to display the definitions from the dictionaries. The HTML/CSS for these widgets is located at the top of scraper_gui/ui/defwebviewui.py. I'm not much of a designer, so I just 適当に designed it, but I'm sure someone with talent could make it look a lot nicer. I would really appreciate any pull requests that fix the design (I guess mostly the CSS).
3) The big hurdle left to conquer is parsing the definitions from Yahoo's dictionaries. There is so much crap in there and it seems like nearly every entry is formatted differently. Parsing the definitions is currently being done in the parse_definitions() function in dictscrape/dictionaries/yahoo/*.py.
I would really appreciate any suggestions/pull requests with regards to parsing. Be sure to check out the section on testing in the README.md.
For non-developers, it would be awesome if you could let me know any words that don't get parsed correctly. Ideally it would be nice if you could create an Issue on github. If you don't have a github account just post here (althought, I can't promise I'll be able to fix the parsing for your word).
Keep in mind I try to take out a lot of the "useless" information from the definition, like the verb conjugation, 補説 parts, etc.

![[Image: OeXDkl.png]](http://i.imgur.com/OeXDkl.png)
![[Image: ryMDal.png]](http://i.imgur.com/ryMDal.png)