"Learning With Texts" software + Japanese?

Index » Learning resources

 
Reply #51 - February 19, 7:50 am
Stansfield123 Member
From: Europe Registered: 2011-04-17 Posts: 799

RawToast wrote:

I guess you should start with an edict dictionary file from here instead:

http://www.edrdg.org/jmdict/edict.html

Since the edit file is simply a specially formatted text file, it shouldn't be too difficult to change the separator. Notepad++ struggles with the size of the file, so I couldn't really get a look at how verbs are handled.

I downloaded that file, unfortunately all the entries look like this:
29日 [にじゅうきゅうにち] /(n) (1) twenty-ninth day of the month/(2) twenty-nine days/

This is the Edict2 format. I don't know what encoding that is, and what to do with it.

[edit] Never mind, followed the link to the Monash site and found a utf8 version. Turns out this first encoding isn't supported by Microsoft (and I'm running Windows). I think I might be able to convert this with just the replace function in SublimeText. Won't even need to code anything.

Last edited by Stansfield123 (February 19, 8:04 am)

Reply #52 - February 19, 10:34 am
Stansfield123 Member
From: Europe Registered: 2011-04-17 Posts: 799

Ok, I did it. Unfortunately, some of the terms won't work (the ones where there's more than one term on a single line), but those are a small percentage.

Also, the file had to be broken up into 10 pieces, because LWT, by default, seems to have an upload limit of 2MB. This is a zip with the csv file as a whole, and the same file broken up into 10 parts: https://drive.google.com/file/d/0B8iOWN … sp=sharing

This is a backup of my database after I uploaded the terms (the 10 files, one by one). In theory, you could just do a restore off of this file (and not bother uploading), but once again there's an issue with the size limit. You can apparently change that size limit though, so I'll share this as well:
https://drive.google.com/file/d/0B8iOWN … sp=sharing

All credit for the content of the files goes to the original source, mentioned by RawToast above. Thank you to them for making this dictionary available for free.

P.S. Just to recap what this does:

In LWT, you don't have words and expressions imported by default. Instead, what the software does is find words (strings separated by spaces), without knowing the meaning, and provide a popup on hover, which, when clicked, looks up the term in an online dictionary and displays the results on the same page.

However, you can also save those dictionary entries, for each word, so that you don't have to look for them again. But it's tedious to do for each new word you encounter. By default, you're only given a few dozen terms with known meaning.

What importing this (massive) list of terms using the "import terms" feature does is take 100.000+ terms, import them, categorize them as "Learning - Level 1", as if you looked them up one by one. In other words, it saves a lot of busywork, without taking anything away from the useful features of LWT.

The correct LWT settings for this to work are: don't treat each character as a word, and don't delete spaces (you have to change them, the default is YES for both). Also, your texts must have spaces (easy to insert with mecab, or online apps that use mecab, such as http://nihongo.dpwright.com/spaces/index.php)

Reply #53 - February 20, 9:49 am
RawToast お巡りさん
From: UK Registered: 2012-09-03 Posts: 431 Website

You're character issue usually is solved in Notepad++, amend the character encoding to Shift-JIS. So: Encoding -> Character Sets -> Japanese -> Shift JIS. This occurs with the 'innocent novels' erm... novels?

LWT, by default, seems to have an upload limit of 2MB.

Urgh...

However, you can also save those dictionary entries, for each word, so that you don't have to look for them again. But it's tedious to do for each new word you encounter. By default, you're only given a few dozen terms with known meaning.

Just like LingQ, but its pretty quick to add your words on that site.

I am sure there are many that will find your work to be of great benefit smile I know I couldn't be bothered to start from scratch...

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor