Back

Anki deck: JMdict complete

#1
Finally got around to making a searchable deck containing all of JMdict with Tangorin-style entries:

LINK

Default card colour scheme shamelessly stolen from cophnia61.

I had thought that a deck this size would cause performance issues in Anki, but so far it seems fine.

The | characters in the 'find' field can be used to compensate for Anki's (afaik) inability to match word boundaries, e.g.:

deck:current |ふう|   #match exact reading/writing

deck:current 人|   にん|  #ending with 人 read as にん

deck:current 船 (ship or boat)   #containing 船 and the words 'ship' or 'boat'

Please let me know if you find any errors/omissions and I'll post an update. I haven't started using it yet as it's still freshly squeezed, so it's quite possible there's some junk hiding in there somewhere. The issue with entries having multiple readings *and* writings was particularly irksome - hard to see what the best choice would be in all cases so I kind of copped out on it.

Where an automated choice had to be made as to what to put in Expression/Reading I've put "choices!" in FrontNotes as a temporary warning to edit the card when it comes up. This will hopefully be workable, though many of the choices are rather trivial (e.g. お尻 vs 御尻 vs オシリ), so could get annoying.

Edit: whoops, forgot to mention: card tags (JLPT levels etc) were taken from the 'Japanese corePLUS' deck, which doesn't seem to be on the shared decks list any more.

Edit2: better search examples above.
Having now used the deck a bit, noticed some possible improvements:

- separate parts of speech and other misc tags with |, e.g. |n|vs| etc, to make it possible to search for e.g. all nouns with |n|, or all expressions with |exp|, without getting false positives.

- improve the tags

- figure out wtf the '3 audio, 1 image' are, because I didn't include any (or so I thought ...)

- put 'japanese' in the note type so it works with kanji stats

Edit3: uploaded a new version with the above fixes. Still no idea wtf the image/audio are though.

It is now possible to search for e.g.

Dialect words:  ben| (a mere 300 total, of which 161 are Kansai-ben)
Math terms: |math
Computing terms: |comp
Swears: |vulg
etc

Todo next: make tag sets for novels, anime eps, core decks, etc.

Edit4: Fixed minor furigana issue, and added a 'common' tag (29459 cards)

Edit 07-feb-2016 - New release:

- Updated JMdict to the latest version
- Got rid of the annoying 'choices!' thing & relegated it to a tag. Added some javascript to the card template to conditionally show the alternatives
- Cleaned up the tags a bit more, though unfortunately only a small proportion of the core6k and core10k cards matched due to differing kanji usage. Todo: better matching.
- A few more furigana fixes
- About 900 entries that were in the old version of the dict but not in current version are tagged 'old1'. These can be deleted if you want; they seem mostly useless but not entirely, hence left in.
Edited: 2016-02-07, 12:37 pm
Reply
#2
This is great! I had a similar idea but it was too much work to do xD The only bad thing is that probably there are words which are not words, like -的 and other suffixes, and things like that...

Do you know what would be great? To make a group work to make a final and definitive vocab deck with only the common words (I know jmdict common is questionable, but it's still a starting point), sanitize the list to clean it from unnecessary entries and add an example sentence for each word. And add various frequency fields like "novels - newspapers etc..". A sort of new and definitive core deck.
I know it is a lot of work but if we join our forces maybe someday we will do it.
For example in the last few days I've made a new deck with 400 entries, the hard part is to find a good and easy example sentence. So it's not impossible, if like we take the first 15k / 16k words and share it between 20 users and each one puts an example sentence for each word in the sublist assigned to it and at the end we merge the deck.

Also there are things I've noted, for example there are words like 姫 which are ranked high in terms of frequency, or at least in the list I'm using as a reference... and I'm sure it's not so common of a word, so it would be hard to find a good enough frequency list...

I don't know if it makes sense but I was feeling like to say this xD
Reply
#3
Thanks for your feedback Smile

Btw 姫 is a very common indeed if you like anime/manga Wink

I used to swear by sentences for learning vocab, but since switching to single-word cards I much prefer them.

Sentences tend to distract from the target word and can even be a bit misleading as regards nuance, especially if poorly chosen (e.g. metaphorical uses, etc). Far better imv to use flashcards as a rough-and-ready way to bootstrap new vocab and hone the nuances in the wild.

Though sentences are still unfortunately necessary to illustrate grammar points Sad
Reply
JapanesePod101
#4
I just took a look at it; awesome work. The only Suggestion I would have would be to pull an example sentence for each word from somewhere. Other than that this will make making vocab cards much easier.
Reply
#5
(2015-06-06, 4:48 pm)anotherjohn Wrote: Btw 姫 is a very common indeed if you like anime/manga Wink

Or history or mythology!
Reply
#6
@anotherjohn
Good work! I haven't looked through more than the fields, though. It seems like a wonderful resource, but I would like to get example sentences in their somehow... sounds like it'd be a huge undertaking though, so if I use it I'll just add them as I need them.

@cophnia61
Maybe with this (or something else) as a base, we could do a community run spreadsheet with SVN. Seems like it'd be difficult to keep interest for such a large project, though...
Reply
#7
(2015-06-06, 11:42 am)cophnia61 Wrote: Do you know what would be great? To make a group work to make a final and definitive vocab deck with only the common words (I know jmdict common is questionable, but it's still a starting point), sanitize the list to clean it from unnecessary entries and add an example sentence for each word. And add various frequency fields like "novels - newspapers etc..". A sort of new and definitive core deck.

I think most of these things could be done programmatically without too much work.

Common words, you could check against several public frequency lists, or create your own from various corpora with CB's tool.  Example sentences could be scraped from google results and filtered for sentences containing common words and by length. Or perhaps example sentences could be pulled from selected corpora and filtered by the same kinds of criteria  Ultimately, everything would need to be quality checked by knowledgable humans though..
Edited: 2015-12-11, 8:38 pm
Reply
#8
Shameless self-bump:

- Updated JMdict to the latest version
- Got rid of the annoying 'choices!' thing & relegated it to a tag. Added some javascript to the card template to conditionally show the alternatives
- Cleaned up the tags a bit more, though unfortunately only a small proportion of the core6k and core10k cards matched due to differing kanji usage. Todo: better matching.
- A few more furigana fixes
- About 900 entries that were in the old version of the dict but not in current version are tagged 'old1'. These can be deleted if you want; they seem mostly useless but not entirely, hence left in.
Reply
#9
(2015-05-25, 11:31 am)anotherjohn Wrote: The | characters in the 'find' field can be used to compensate for Anki's (afaik) inability to match word boundaries, e.g.:

I made an addon for that: https://ankiweb.net/shared/info/125550793

It's useless for Japanese, though (the regex will actually treat Japanese characters as non-word characters - unless the locale is Japanese, I guess). But it's so much more intuitive for English that I made my personal copy default to word search.
Reply
#10
Looks good, but how would one go about merging the most recent update, with the previous one? I don't want to loose my place, progress, and what not.
Reply
#11
The .tsv is 7.6M zipped so just about hotmailable - I can email it if you pm me your address if you want & you can import it into the current version without losing review history (which is what I've done).

I guess an 'import from .anki file' plugin would be ideal. I'll look into that later but I've got to go to work now Sad
Reply
#12
For getting example sentences, you can use cb4960's Epwing2Anki.

You could also use Add note id for Anki, to add a unique id number to each card, and make it simpler to import new versions of the deck without losing or misplacing data.
Reply
#13
JMdict has its own uid system so that part's not a problem.
Reply
#14
Out of curiosity, what does the tag "znt" mean?
Reply
#15
(2016-02-09, 8:42 am)RandomQuotes Wrote: Out of curiosity, what does the tag "znt" mean?
Vocab from the first 12 vols Zero no Tsukaima, taken from a deck that used to be on Ankiweb.

Annoyingly I've discovered some daft furi errors affecting ~150 cards Rolleyes

I'll fix them and update the deck shortly & do an addon to import from .anki files.
Reply
#16
Fixed the daft furi errors.

You can already choose .anki format when selecting a file to import from, so presumably this can be used to update an existing installation (though I haven't tried it myself).
Reply
#17
Fantastic share, thank you kind sir.
Reply
#18
Thanks for sharing this wonderful deck. Do you plan, by any chance, add some sentence examples in future updates?

If I use "Sanseido Definitions" and "Japanese Example Sentences" plugins, will my deck be inconsistent with any future updates you may release?
Edited: 2018-01-07, 11:12 am
Reply