japanese 先生 = core 10000?

Index » Learning resources

Reply #26 - 2011 January 22, 4:31 pm
overture2112 Member
From: New York Registered: 2010-05-16 Posts: 400

nest0r wrote:

So you're close? Nice.

Yes, I have all the data and everything can be easily correlated now with the exception of audio.

Reply #27 - 2011 January 22, 9:10 pm
nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

Awesome. The forum will give you Blahah's firstborn child in exchange for this, when you're done.

scotty28 Member
From: Kyoto Registered: 2008-12-20 Posts: 12

Just like to say a big thanks for getting this done. I`ve been working my way through Nukemarines`s anki decks and wishing, more and more, that the rest of the 10,000 words were there to complete it.

I have japanese 先生 and it`s pretty good, but to have it in anki - awesome!

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor
 
Reply #29 - 2011 January 24, 2:47 am
overture2112 Member
From: New York Registered: 2010-05-16 Posts: 400

I saw a neat thread at http://tinyurl.com/VGWBZ927 while playing Sega Pup Toads.  It may take awhile, but I'll also post a link to some audio of cool words and sentences I heard while playing.

In completely unrelated news I was able to link up the:
- word's internal index #
- word's internal index pack (A, B, or S)
- word's expression
- word's reading
- word's english meaning(s)
- word's japanese meaning(s) (this was just the expression and reading for the ones I checked, but kept this data anyway)
- word's possible audio file based on the recommendation of using the phonetic ordering (it works about half the time- usually off by 1 or 2)
- an example sentence's expression
- sentence's possible audio (it matches the word audio- but as mentioned, sometimes it's wrong)

Unfortunately this means I still don't have:
- example sentence english meaning
- 100% correct audio

Blahah's observation that the audio is phonetically sorted was extremely helpful in getting it correct within +0 to +3 positions.  I'd think that given 4 options you might be able to automate choosing the audio for a reading, but I'm not really knowledgable about voice recognition in English, let alone Japanese.

I have all the data in all the files except the JIB_??A.res.dict and JIB_??B.res.dict files, which I'm almost certain specifiy the information to resolve the remaining issues above based on the contents of the IB_??S.res.dict file (which unlike A and B is plaintext xml) and the same company's Japanese<->English dictionary prorgam (where the similarly named files also use plaintext xml).

I think I'll have to take a break for now, but I'll post some information in the hopes someone else has better luck at figuring out those few remaining problematic files.  That said, it useful enough to use at this point- and perhaps someone could go through the audio and find the correct mapping of filenames to word/sentence (and post it).

---- JIB_??{A,B}.res.dict info:
* I disassembled the code for the program and found a number of functions with names like
-[Dictionary setEncryptionPassword:]
-[NSData(CZDataExtensions) encryptWithKey:]:
-[NSData(CZDataExtensions) decryptWithKey:]:

so it's quite possible the .res.dict files are encrypted.  That said, there's also code for Russian and German dictionaries and other stuff that's clearly not used (but probably is by their other products), so maybe not.

* The blobs pointed to by the .idx file are about 884-1030 bytes for the plaintext xml ones (S pack) and nearly the same for the unknown ones (A and B pack), which implies they store similar data.  Also, compressing a few from S pack results in about 50% reduction- thus it's unlikely the unknown blobs from A/B are compressed (since otherwise we should expect the size of the blobs to be smaller).

* The unknown blobs have similar patterns in data.  In fact, the first 30-50 bytes _exactly_ match one of 19 different sequences for the 28,000+ blobs.  That doesn't match the behavior of any encryption algorithm I know, as anything modern would look far more random.  From a quick frequency analysis, there appears to be a mostly equal distribution of bytes for the unknown A/B packs,  whereas S pack is drastically uneven (eg, the '<' and '>' characters for the xml tags are far more common)- so a simple caesar cipher or something seems unlikely as well.

Thus, given the evidence, the data is either in some adhoc encryption or (more likely) just some strange binary format that I wasn't able to recognize.  Hopefully someone else will have better luck.

Last edited by overture2112 (2011 January 24, 3:20 am)

Reply #30 - 2011 January 24, 5:15 am
Blahah Member
From: Cambridge, UK Registered: 2008-07-15 Posts: 715 Website

nice work overture2112, thanks so much for putting in all this effort. It's definitely at a usable stage right now, but I'll see if I can work on solving those last few steps. It might take *long* while but it'll be a nice chance for me to actually learn to code.

The previous python script you posted was really useful for figuring out how you got to where you did, do you think you could post the rest of the code you used? I'll see if I can muddle my way through it.

Thanks again!

overture2112 Member
From: New York Registered: 2010-05-16 Posts: 400

Blahah wrote:

The previous python script you posted was really useful for figuring out how you got to where you did, do you think you could post the rest of the code you used? I'll see if I can muddle my way through it.

https://github.com/jre2/JapaneseStudy/t … er/jsensei

The files of note are `mach4.py` (the latest version of the code) and possibly `crypt.py` (an older and messier fork with some misc utils to explore the unknown blobs from the A/B .res.dict files).  Also, the type signatures on functions are just approximations for documentation of intent.

overture2112 Member
From: New York Registered: 2010-05-16 Posts: 400

Apparently my code in crypt.py was off by 1 when checking for the lengths of common prefixes and I missed something potentially useful.

Prefix frequency and length analysis:

Code:

Count    Length    Prefix first char
3432x    33b    '\x01'
57x    48b    'a'
189x    32b    'C'
18x    32b    '\x9a'
2910x    56b    '6'
1032x    48b    '\xaa'
198x    32b    'm'
216x    48b    '\xcc'
1140x    32b    '\xee'
132x    56b    '\xa1'
5826x    56b    'S'
408x    32b    '\x14'
48x    56b    'v'
906x    48b    '\x19'
3x    1032b    '{'
33x    56b    'n'
11898x    56b    '\x9d'
21x    48b    '\xb3'
390x    32b    '^'

Thus, with 2 exceptions, the common prefix lengths are all powers of 2 (also divisible by 4- ie, a 32bit word length) and half are the 56 byte length prefix starting with \x9d.  This seems interesting and perhaps worth exploring.

Last edited by overture2112 (2011 January 24, 11:27 am)

nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

Great progress! You seem to be having fun with that technical stuff. Don't forget to study Japanese at some point. ;p

Reply #34 - 2011 January 24, 1:20 pm
overture2112 Member
From: New York Registered: 2010-05-16 Posts: 400

As mentioned before and completely unrelated to anything else in this thread, here's a link to some interesting information about m4a files that I found while listening to Sega Pup Toads: http://tinyurl.com/SPQ0O0LU

Reply #35 - 2011 January 24, 2:59 pm
overture2112 Member
From: New York Registered: 2010-05-16 Posts: 400

Complete tangent that in no way relates to this thread: The final sentences of Sega Pup Toads reminded me of this article http://tinyurl.com/PYXSYDTI

Anyway, now that

- a pack of word audio files
- a pack of sentence audio files
- a tsv with the field data

are all mysteriously available on the internet and easily imported into anki, I'll call it quits for now (unless someone makes a breakthrough with the .res.dict files) and take advantage of this new study resource.

If you *somehow* wind up with a new addition to your vocab deck that happens to be oddly similar to the vocab in the J Sensei app, you might want to thank them and buy it.

Reply #36 - 2011 January 24, 3:11 pm
nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

I don't know what you're talking about but... amazing! Thanks! For nothing, I mean. I hope you aren't a wrathful programmer-god, because I was bluffing before.

http://bible.cc/exodus/4-23.htm

And yes, it's clear the app is a worthwhile purchase for the iDevice.

Last edited by nest0r (2011 January 24, 3:17 pm)

Reply #37 - 2011 January 24, 3:16 pm
Blahah Member
From: Cambridge, UK Registered: 2008-07-15 Posts: 715 Website

nest0r wrote:

I don't know what you're talking about but... amazing! Thanks! For nothing, I mean. I hope you aren't a wrathful programmer-god, because I was bluffing before.

http://bible.cc/exodus/4-23.htm

Could have told me it was a bluff! I had already broken the news to my pregnant girlfriend...

Reply #38 - 2011 January 24, 3:36 pm
nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

Blahah wrote:

nest0r wrote:

I don't know what you're talking about but... amazing! Thanks! For nothing, I mean. I hope you aren't a wrathful programmer-god, because I was bluffing before.

http://bible.cc/exodus/4-23.htm

Could have told me it was a bluff! I had already broken the news to my pregnant girlfriend...

Hahaha. Glad you've got your priorities straight.

Last edited by nest0r (2011 January 24, 3:38 pm)

Reply #39 - 2011 January 24, 4:28 pm
nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

My eyes glazed over while reading so I might've missed something, but I noticed that the audio is off by 1? Like if a file were to be named: lettersnumber9.ext which plays a sentence, the text of that sentence is associated with a file named lettersnumber8.ext (and so on for that sentence and the text it's associated with). Maybe I screwed up something when opening?

Last edited by nest0r (2011 January 24, 4:28 pm)

Reply #40 - 2011 January 24, 5:09 pm
Blahah Member
From: Cambridge, UK Registered: 2008-07-15 Posts: 715 Website

Yeah the audio still needs to be matched. There must be a place where the audio filenames are associated with the vocab, just have to find it.

Reply #41 - 2011 January 24, 5:10 pm
overture2112 Member
From: New York Registered: 2010-05-16 Posts: 400

nest0r wrote:

My eyes glazed over while reading so I might've missed something, but I noticed that the audio is off by 1? Like if a file were to be named: lettersnumber9.ext which plays a sentence, the text of that sentence is associated with a file named lettersnumber8.ext (and so on for that sentence and the text it's associated with). Maybe I screwed up something when opening?

Unfortunately the audio files are often off by 0, 1, 2, or (rarely) 3.  I'm not sure if there's a good solution to this beyond manually correcting them.  If anyone spots any patterns then perhaps I could adjust it.

The likely reasons are:

1) The seiyuu recorded them in phonetic order but later re-recorded certain ones out of order?  Seems unlikely since that would presumably throw them completely off.

2) The library I used to sort the words did it incorrectly.  Perhaps someone could look at https://github.com/jre2/JapaneseStudy/r … rds.sorted to confirm this.

Reply #42 - 2011 January 24, 5:14 pm
overture2112 Member
From: New York Registered: 2010-05-16 Posts: 400

Blahah wrote:

Yeah the audio still needs to be matched. There must be a place where the audio filenames are associated with the vocab, just have to find it.

Yes, the JIB_ejS.res.dict file (which has plaintext xml) clearly explains which audio file goes with which word, and which example sentences (english+japanese+audio file).  Unfortunately the S pack is only the first 50 words and the A/B packs use some other format, are encrypted, or otherwise elude me.

That said, the word audio seems to always match the sentence audio (same filename but 'jw' instead of 'js'), at least from all the ones I checked.

Reply #43 - 2011 January 24, 5:20 pm
nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

Curses, I thought I'd figured out the magic pattern since so far they're all off by 1 (edit: n/m). Oh well, for my purposes of occasional consultation, a manual adjustment will be fine, methinks.

Maybe if they're all within that range of 0-3, then each audio file can be associated with that range of text sentences (extraneous to be eliminated per encounter). Is it never higher than the audio file number, only equal or less?

Edit: Just ignore me, I'll leave this to those good at this sort of subject. n00b out.

Last edited by nest0r (2011 January 24, 5:37 pm)

Reply #44 - 2011 January 24, 5:38 pm
overture2112 Member
From: New York Registered: 2010-05-16 Posts: 400

nest0r wrote:

Curses, I thought I'd figured out the magic pattern since so far they're all off by 1 (edit: n/m). Oh well, for my purposes of occasional consultation, a manual adjustment will be fine, methinks.

Maybe if they're all within that range of 0-3, then each audio file can be associated with that range of text sentences (extraneous to be eliminated per encounter). Is it never higher than the audio file number, only equal or less?

I think I listened to about 50 of them, mostly spread throughout the deck, and that's how it was, always dead on or off by 1-3 forward (never back, at least for what I saw).  So yeah,  I guess I could add a few fields that contain a handful of the next audio files to facilitate easier manual correction in anki, that sounds like a good idea.  I'll try to do so before I go to sleep.

Reply #45 - 2011 January 25, 5:45 am
Matthias Member
From: Germany Registered: 2005-10-27 Posts: 37

@overture2112:
Maybe the problem is that you orientated yourself at the hiragana words.
There are only 9619 hiragana words but 9669 sentences.

Example: あう you matched only to jw00027a.m4a and js00027a.m4a   

会う    また 会いましょう。
合う    この 靴 は 私 の 足 に 合って いる。
逢う    ついに 素晴らしい 女性 に あえた。
遭う    彼 は 交通 事故 に 遭った。

So you missed 00028 - 00030 in the matching table.

Hope this helps.

Last edited by Matthias (2011 January 25, 6:07 am)

Reply #46 - 2011 January 25, 1:21 pm
overture2112 Member
From: New York Registered: 2010-05-16 Posts: 400

EDIT: I'm updating this as I confirm things [in red] and greying out answered questions or incorrect statements

Matthias wrote:

@overture2112:
Maybe the problem is that you orientated yourself at the hiragana words.
There are only 9619 hiragana words but 9669 sentences.

Note: I haven't spent much time listening to the audio beyond some simple tests while compiling everything over the weekend nor have I had the chance to inspect your example (I'll try to do so soon-ish but I don't have access to it right now).  As such, the post below is mostly me thinking out loud while ignorant of a more thorough inspection.
Ah, I understand your example now- you were referring to the sentence audio being incorrect since I always link it to the first instance of a reading.


So currently what I do is take a list of all vocab word readings, sort them phonetically, and then use this to assign a phonetic index to the vocab word for each of the 9669 sentences based on that sorting order.  Then for each sentence, I assign the Nth audio file (jw version to the word, js to the sentence) if N was the phonetic index.  Here's some potential issues:

1) How do you get the list of readings?  I originally grabbed them from the A*.dat and B*.dat files, but I could also get the reading of the vocab word for each sentence (let's call that the .tsv data).

The .dat files have 9619 words, thus 9619 readings.
The .tsv data (generated from sentences) has 9669 sentences, thus 9669 vocab words, thus 9669 readings.

Of course, the .dat files have only 9417 unique expressions, 8679 unique readings, and 9517 unique pairs of expression+reading (adding the S pack's S1.dat just adds 50 duplicate words).  The .tsv sentence data has 9604 unique sentence expressions 9405 unique vocab expressions, 8679 unique vocab readings, and 9541 unique pairs of vocab expression+reading.

There are 9621 audio files of each type (plus an erroneous tmp.m4a in sentences, but this appears after all other files when sorted alphanumerically, so we can ignore it) and they all have different checksums, thus they contain different data.  That said, there could be multiple (slightly different) recordings of the same word/sentence, but there's no easy way to verify that except manual effort.
There are multiple (marginally different) recordings of the same reading if the word has multiple example sentences or if there's multiple expressions with the same reading.


=> The problem I'm trying to point out, is that none of these numbers match up, so there's no obvious choice as to which is best.

2) I'm currently assigning sentence audio based on the sentence's vocab word's audio filename, mostly because it seems to work better than no sentence audio at all, but certainly not because it's a robust idea.  Also, as noted above, there are 9604 unique sentence expressions and 9621 sentence audio files?

3) You call the readings hiragana words above.  Unfortunately (at least for sorting reasons), the readings are in hiragana or katakana, which initially caused some problems (eg, python's locale library's strcoll assigns the first katakana words after the last hiragana words).  I switched to a perl library that seems to intermix them properly, but perhaps there's some other caveat that I'm missing.

Last edited by overture2112 (2011 January 25, 3:20 pm)

Reply #47 - 2011 January 25, 4:31 pm
Matthias Member
From: Germany Registered: 2005-10-27 Posts: 37

Here are the hiragana/katakana words with the audio file (word/sentence number is identical) which have more than one entry. The last number shows how many times they are included in your matching list.
あう    jw00027a.m4a    4 (previous example)
あおぐ    jw00035a.m4a    2
あか    jw00042a.m4a    2
あがる    jw00051a.m4a    4
あき    jw00057a.m4a    4
あく    jw00067a.m4a    3
あける    jw00091a.m4a    3
あげる    jw00094a.m4a    4
あじ    jw00115a.m4a    3
あせる    jw00132a.m4a    2
あたり    jw00148a.m4a    2
あつい    jw00156a.m4a    3
あつさ    jw00165a.m4a    2
あと    jw00186a.m4a    2
あぶら    jw00211a.m4a    2
あまり    jw00230a.m4a    3
あめ    jw00239a.m4a    2
あやまる    jw00250a.m4a    2
あらい    jw00256a.m4a    3
あらわす    jw00273a.m4a    3
あんぜん    jw00320a.m4a    2
い    jw00332a.m4a    2
いえ    jw00352a.m4a    2
いか    jw00356a.m4a    2
いかが    jw00361a.m4a    2
いかり    jw00368a.m4a    2
いき    jw00371a.m4a    3
いきおい    jw00376a.m4a    2
いくじ    jw00393a.m4a    2
いくら    jw00399a.m4a    2
いご    jw00406a.m4a    2
いこう    jw00410a.m4a    2
いし    jw00419a.m4a    3
いじ    jw00422a.m4a    2
いじょう    jw00432a.m4a    2
いせい    jw00445a.m4a    2
いぜん    jw00448a.m4a    2
いたい    jw00454a.m4a    2
いたむ    jw00465a.m4a    3
いためる    jw00468a.m4a    2
いち    jw00474a.m4a    4
いつか    jw00502a.m4a    2
いっけん    jw00506a.m4a    2
いっしゅ    jw00514a.m4a    2
いったい    jw00526a.m4a    2
いつも    jw00543a.m4a    2
いと    jw00548a.m4a    2
いど    jw00550a.m4a    2
いどう    jw00552a.m4a    2
いま    jw00572a.m4a    2
いらい    jw00599a.m4a    2
いりょう    jw00605a.m4a    2
いる    jw00607a.m4a    2
うえ    jw00649a.m4a    2
うえる    jw00655a.m4a    2
うたう    jw00701a.m4a    2
うち    jw00707a.m4a    3
うつ    jw00721a.m4a    3
うつす    jw00726a.m4a    3
うつる    jw00735a.m4a    3
うまい    jw00755a.m4a    2
うむ    jw00764a.m4a    2
うる    jw00794a.m4a    2
うるさい    jw00799a.m4a    2
え    jw00830a.m4a    2
えいせい    jw00844a.m4a    2
えき    jw00856a.m4a    2
えん    jw00882a.m4a    4
えんぎ    jw00890a.m4a    2
えんしゅう    jw00896a.m4a    2
おう    jw00935a.m4a    3
オーバー    jw00986a.m4a    2
おか    jw01000a.m4a    2
おかしい    jw01008a.m4a    2
おかす    jw01010a.m4a    3
おきる    jw01027a.m4a    2
おく    jw01029a.m4a    3
おくる    jw01042a.m4a    2
おこす    jw01047a.m4a    3
おこる    jw01053a.m4a    2
おごる    jw01055a.m4a    2
おさえる    jw01057a.m4a    2
おさまる    jw01061a.m4a    3
おさめる    jw01064a.m4a    5
おじさん    jw01078a.m4a    2
おしゃべり    jw01086a.m4a    2
おじょうさん    jw01089a.m4a    2
おす    jw01093a.m4a    2
おとな    jw01147a.m4a    2
おばさん    jw01172a.m4a    2
おもい    jw01195a.m4a    2
おもいきり    jw01201a.m4a    2
おり    jw01242a.m4a    2
おりる    jw01249a.m4a    2
おる    jw01252a.m4a    3
おろす    jw01263a.m4a    3
おん    jw01271a.m4a    2
か    jw01295a.m4a    3
かい    jw01314a.m4a    2
かいき    jw01332a.m4a    2
かいぎ    jw01334a.m4a    2
かいこ    jw01344a.m4a    2
かいしゃ    jw01359a.m4a    2
かいじょう    jw01367a.m4a    2
かいせい    jw01379a.m4a    2
かいせつ    jw01381a.m4a    2
かいそう    jw01384a.m4a    2
かいだん    jw01390a.m4a    2
かいてん    jw01397a.m4a    2
かいとう    jw01400a.m4a    2
がいとう    jw01403a.m4a    2
かいほう    jw01411a.m4a    2
かう    jw01428a.m4a    2
かえす    jw01430a.m4a    2
かえりみる    jw01434a.m4a    2
かえる    jw01436a.m4a    6
かがく    jw01454a.m4a    2
かかる    jw01464a.m4a    3
かき    jw01468a.m4a    3
かく    jw01483a.m4a    6
かぐ    jw01491a.m4a    2
がく    jw01493a.m4a    2
かくしん    jw01508a.m4a    3
かくりつ    jw01535a.m4a    2
かげ    jw01545a.m4a    2
かけい    jw01549a.m4a    2
かける    jw01555a.m4a    5
かこう    jw01564a.m4a    4
かじ    jw01583a.m4a    3
かず    jw01603a.m4a    2
かぜ    jw01608a.m4a    2
かせん    jw01617a.m4a    2
かた    jw01626a.m4a    4
かち    jw01662a.m4a    2
かつ    jw01668a.m4a    2
がっかい    jw01672a.m4a    2
がっき    jw01676a.m4a    2
かっこう    jw01681a.m4a    2
かって    jw01691a.m4a    2
かてい    jw01706a.m4a    4
かね    jw01732a.m4a    3
かぶせる    jw01752a.m4a    2
かぶる    jw01756a.m4a    2
かみ    jw01765a.m4a    4
から    jw01794a.m4a    2
がら    jw01796a.m4a    2
からい    jw01799a.m4a    2
からから    jw01803a.m4a    2
かり    jw01820a.m4a    2
かりゅう    jw01823a.m4a    2
かわ    jw01844a.m4a    3
かわく    jw01853a.m4a    2
かわす    jw01855a.m4a    2
かわら    jw01858a.m4a    2
かわる    jw01861a.m4a    2
かん    jw01864a.m4a    4
かんかく    jw01880a.m4a    2
かんき    jw01885a.m4a    3
かんご    jw01896a.m4a    2
かんこう    jw01899a.m4a    2
かんじ    jw01906a.m4a    3
かんしゅう    jw01912a.m4a    2
かんしょう    jw01916a.m4a    3
かんじょう    jw01919a.m4a    2
かんしょく    jw01922a.m4a    2
かんしん    jw01925a.m4a    2
かんせつ    jw01932a.m4a    2
かんそう    jw01936a.m4a    2
かんたん    jw01941a.m4a    2
かんてい    jw01948a.m4a    2
かんりょう    jw01972a.m4a    2
かんわ    jw01976a.m4a    2
き    jw01978a.m4a    2
きか    jw01990a.m4a    2
きかい    jw01993a.m4a    3
きかん    jw02005a.m4a    3
きく    jw02019a.m4a    4
きぐ    jw02023a.m4a    2
きけん    jw02028a.m4a    2
きげん    jw02030a.m4a    4
きこう    jw02035a.m4a    2
きじ    jw02047a.m4a    2
きしゃ    jw02054a.m4a    2
きじゅん    jw02058a.m4a    2
きしょう    jw02060a.m4a    3
きせい    jw02069a.m4a    5
きそ    jw02080a.m4a    2
きたい    jw02090a.m4a    3
きちょう    jw02101a.m4a    2
きつい    jw02107a.m4a    2
きのう    jw02132a.m4a    2
きみ    jw02155a.m4a    2
きゅう    jw02184a.m4a    6
きゅうくつ    jw02198a.m4a    2
きゅうこう    jw02202a.m4a    3
きゅうそく    jw02215a.m4a    2
きゅうよう    jw02224a.m4a    2
きょういく    jw02236a.m4a    2
きょうかい    jw02240a.m4a    2
きょうぎ    jw02245a.m4a    2
きょうこう    jw02251a.m4a    2
きょうよう    jw02279a.m4a    3
きょうりょく    jw02283a.m4a    2
きょく    jw02291a.m4a    2
きらい    jw02311a.m4a    2
きり    jw02319a.m4a    2
きりつ    jw02327a.m4a    2
きる    jw02331a.m4a    2
きれ    jw02333a.m4a    2
きん    jw02345a.m4a    3
きんこう    jw02361a.m4a    2
きんし    jw02364a.m4a    2
きんせい    jw02372a.m4a    2
く    jw02390a.m4a    3
くび    jw02495a.m4a    2
くむ    jw02509a.m4a    2
くも    jw02511a.m4a    2
くらい    jw02518a.m4a    2
クラス    jw02526a.m4a    2
くるま    jw02550a.m4a    2
くれる    jw02559a.m4a    2
くわえる    jw02566a.m4a    2
ぐん    jw02573a.m4a    3
ぐんしゅう    jw02580a.m4a    2
けい    jw02591a.m4a    2
けいかい    jw02598a.m4a    2
けいじ    jw02616a.m4a    2
けっかん    jw02684a.m4a    2
けっこう    jw02690a.m4a    2
けっしょう    jw02700a.m4a    2
けん    jw02732a.m4a    2
げんかく    jw02746a.m4a    2
げんき    jw02749a.m4a    2
けんこう    jw02757a.m4a    3
げんし    jw02770a.m4a    2
げんしゅ    jw02774a.m4a    2
けんしょう    jw02779a.m4a    2
げんしょう    jw02781a.m4a    2
げんそく    jw02790a.m4a    2
げんてん    jw02800a.m4a    2
けんとう    jw02803a.m4a    2
こ    jw02825a.m4a    2
ご    jw02831a.m4a    2
こい    jw02835a.m4a    4
こうい    jw02846a.m4a    3
こうえん    jw02857a.m4a    4
こうおん    jw02861a.m4a    2
こうか    jw02863a.m4a    4
こうかい    jw02868a.m4a    3
こうがい    jw02871a.m4a    2
こうかん    jw02876a.m4a    2
こうぎ    jw02879a.m4a    2
こうきゅう    jw02883a.m4a    2
こうぎょう    jw02886a.m4a    2
こうげん    jw02897a.m4a    2
こうご    jw02899a.m4a    2
こうこう    jw02901a.m4a    2
こうし    jw02912a.m4a    4
こうしゃ    jw02919a.m4a    2
こうじょう    jw02923a.m4a    2
こうすい    jw02927a.m4a    2
こうせい    jw02930a.m4a    3
こうそう    jw02937a.m4a    2
こうそく    jw02940a.m4a    3
こうちょう    jw02949a.m4a    2
こうてい    jw02955a.m4a    6
こうどう    jw02965a.m4a    2
こうばい    jw02976a.m4a    2
こうふ    jw02983a.m4a    2
こうふく    jw02985a.m4a    2
こうぶつ    jw02987a.m4a    2
こうめい    jw02998a.m4a    2
こうりつ    jw03007a.m4a    2
こうれい    jw03014a.m4a    2
こえる    jw03020a.m4a    2
コート    jw03024a.m4a    2
こくめい    jw03069a.m4a    2
ここ    jw03082a.m4a    2
こしょう    jw03116a.m4a    2
こちら    jw03138a.m4a    2
こと    jw03157a.m4a    2
サービス    jw03282a.m4a    2
さいかい    jw03289a.m4a    2
さいきん    jw03296a.m4a    2
さいけつ    jw03299a.m4a    2
さいご    jw03304a.m4a    2
さいさん    jw03307a.m4a    2
さいしゅう    jw03313a.m4a    2
さがす    jw03359a.m4a    2
さき    jw03371a.m4a    2
さく    jw03378a.m4a    5
さける    jw03405a.m4a    2
さす    jw03426a.m4a    5
さっき    jw03450a.m4a    2
さどう    jw03469a.m4a    2
さばく    jw03472a.m4a    2
さます    jw03483a.m4a    2
さめる    jw03491a.m4a    2
さる    jw03510a.m4a    2
さわる    jw03519a.m4a    2
さん    jw03521a.m4a    3
さんか    jw03526a.m4a    2
さんせい    jw03541a.m4a    2
し    jw03561a.m4a    6
じ    jw03571a.m4a    2
じえい    jw03590a.m4a    2
しお    jw03597a.m4a    2
しがい    jw03605a.m4a    2
しかく    jw03608a.m4a    2
じかく    jw03610a.m4a    2
じかん    jw03621a.m4a    2
しき    jw03624a.m4a    3
じき    jw03627a.m4a    2
じこ    jw03657a.m4a    2
しこう    jw03659a.m4a    2
じこく    jw03663a.m4a    2
しじ    jw03673a.m4a    2
ししゃ    jw03676a.m4a    2
じしゅ    jw03680a.m4a    2
しじょう    jw03688a.m4a    2
じしん    jw03693a.m4a    3
しせつ    jw03704a.m4a    2
しぜん    jw03706a.m4a    2
した    jw03715a.m4a    3
じたい    jw03719a.m4a    2
してん    jw03796a.m4a    2
じどう    jw03801a.m4a    2
しほう    jw03840a.m4a    2
しぼう    jw03842a.m4a    2
しま    jw03850a.m4a    2
しまる    jw03856a.m4a    2
しめい    jw03868a.m4a    2
しめる    jw03877a.m4a    4
しも    jw03883a.m4a    2
じゃま    jw03928a.m4a    2
しゅう    jw03942a.m4a    2
じゅう    jw03947a.m4a    3
しゅうかん    jw03955a.m4a    2
しゅうし    jw03969a.m4a    3
しゅうしゅう    jw03977a.m4a    2
じゅうしょう    jw03980a.m4a    2
じゅうたい    jw03991a.m4a    2
しゅうとく    jw04006a.m4a    2
しゅうりょう    jw04024a.m4a    2
しゅっけつ    jw04063a.m4a    2
しょう    jw04121a.m4a    3
しよう    jw04124a.m4a    2
じょう    jw04131a.m4a    2
しょうか    jw04135a.m4a    2
しょうかい    jw04137a.m4a    2
しょうがい    jw04139a.m4a    3
じょうきょう    jw04153a.m4a    2
しょうじき    jw04168a.m4a    2
しょうしょう    jw04177a.m4a    2
しょうじょう    jw04179a.m4a    2
じょうず    jw04185a.m4a    2
しょうすう    jw04187a.m4a    2
しょうたい    jw04196a.m4a    2
しょうてん    jw04207a.m4a    2
しょうにん    jw04215a.m4a    2
しょうひん    jw04227a.m4a    2
しょうめい    jw04238a.m4a    2
じょうりゅう    jw04252a.m4a    2
しょうりょう    jw04254a.m4a    2
しょき    jw04262a.m4a    2
じょし    jw04295a.m4a    2
しりつ    jw04336a.m4a    2
しる    jw04340a.m4a    2
しろ    jw04345a.m4a    2
しん    jw04354a.m4a    2
しんか    jw04361a.m4a    2
しんぎ    jw04369a.m4a    2
しんこう    jw04376a.m4a    4
じんこう    jw04381a.m4a    2
しんこく    jw04383a.m4a    2
しんじつ    jw04391a.m4a    2
しんせい    jw04403a.m4a    2
しんせつ    jw04407a.m4a    2
しんちょう    jw04421a.m4a    2
しんにゅう    jw04430a.m4a    2
しんねん    jw04432a.m4a    2
しんり    jw04454a.m4a    2
す    jw04464a.m4a    2
すいこう    jw04472a.m4a    2
すいせん    jw04481a.m4a    2
すいとう    jw04491a.m4a    2
ずいぶん    jw04497a.m4a    2
すき    jw04528a.m4a    2
すくう    jw04544a.m4a    2
すすめる    jw04570a.m4a    2
すみ    jw04623a.m4a    3
すむ    jw04628a.m4a    3
する    jw04643a.m4a    4
せい    jw04656a.m4a    6
せいかい    jw04669a.m4a    2
せいかく    jw04671a.m4a    2
せいかつ    jw04673a.m4a    2
せいき    jw04676a.m4a    2
せいきゅう    jw04679a.m4a    2
せいざ    jw04689a.m4a    2
せいさく    jw04692a.m4a    2
せいさん    jw04694a.m4a    4
せいし    jw04698a.m4a    3
せいしょ    jw04710a.m4a    2
せいそう    jw04720a.m4a    2
せいちょう    jw04728a.m4a    2
せいとう    jw04737a.m4a    2
せいねん    jw04740a.m4a    2
せいふく    jw04748a.m4a    2
せいめい    jw04756a.m4a    3
せいやく    jw04760a.m4a    2
せいり    jw04764a.m4a    2
せかい    jw04773a.m4a    2
せき    jw04777a.m4a    2
せつ    jw04794a.m4a    2
ぜっこう    jw04805a.m4a    2
ぜひ    jw04832a.m4a    2
せめる    jw04841a.m4a    2
せん    jw04849a.m4a    3
ぜん    jw04856a.m4a    2
ぜんかい    jw04862a.m4a    2
せんさい    jw04875a.m4a    2
せんざい    jw04877a.m4a    2
ぜんしん    jw04889a.m4a    2
せんたく    jw04904a.m4a    2
せんとう    jw04917a.m4a    3
ぜんぶ    jw04928a.m4a    2
そう    jw04953a.m4a    3
そうかん    jw04965a.m4a    2
そうさ    jw04976a.m4a    2
そうさい    jw04978a.m4a    2
そうさく    jw04980a.m4a    2
そうしょく    jw04990a.m4a    2
そうぞう    jw04996a.m4a    2
そこ    jw05045a.m4a    2
そっくり    jw05066a.m4a    2
そなえる    jw05076a.m4a    3
そなわる    jw05079a.m4a    2
そば    jw05087a.m4a    2
そる    jw05104a.m4a    2
た    jw05137a.m4a    2
だい    jw05144a.m4a    3
だいいち    jw05152a.m4a    2
たいき    jw05166a.m4a    2
たいけい    jw05175a.m4a    2
たいしょう    jw05197a.m4a    4
だいしょう    jw05201a.m4a    2
たいせい    jw05209a.m4a    3
たいせき    jw05212a.m4a    2
たいそう    jw05217a.m4a    2
だいたい    jw05220a.m4a    2
たいちょう    jw05228a.m4a    2
たいひ    jw05245a.m4a    2
だいべん    jw05257a.m4a    2
たえる    jw05284a.m4a    2
たく    jw05307a.m4a    2
たこ    jw05323a.m4a    2
たずねる    jw05341a.m4a    2
ただ    jw05344a.m4a    3
たたかう    jw05350a.m4a    2
たつ    jw05374a.m4a    7
たてこむ    jw05401a.m4a    2
たてる    jw05409a.m4a    2
たとえ    jw05412a.m4a    2
たま    jw05446a.m4a    2
たる    jw05480a.m4a    2
だん    jw05491a.m4a    2
たんか    jw05497a.m4a    2
たんき    jw05501a.m4a    2
ち    jw05547a.m4a    2
ちか    jw05559a.m4a    2
ちかく    jw05566a.m4a    3
ちち    jw05600a.m4a    2
ちゅう    jw05638a.m4a    2
ちゅうしゃ    jw05661a.m4a    2
ちゅうせい    jw05669a.m4a    2
ちゅうりゅう    jw05684a.m4a    2
ちょう    jw05687a.m4a    4
ちょうしゅう    jw05707a.m4a    2
ちょうだい    jw05716a.m4a    2
ちょっかん    jw05752a.m4a    2
ちり    jw05766a.m4a    2
つい    jw05780a.m4a    2
ついきゅう    jw05783a.m4a    3
つうか    jw05798a.m4a    2
つかまる    jw05827a.m4a    2
つき    jw05833a.m4a    2
つぎ    jw05834a.m4a    2
つく    jw05849a.m4a    5
つぐ    jw05854a.m4a    3
つける    jw05873a.m4a    4
つごう    jw05878a.m4a    2
つとめ    jw05898a.m4a    2
つとめる    jw05902a.m4a    3
つみ    jw05932a.m4a    2
つむ    jw05935a.m4a    2
つや    jw05943a.m4a    2
つゆ    jw05946a.m4a    2
つる    jw05964a.m4a    3
ていおん    jw05984a.m4a    2
ていか    jw05986a.m4a    2
ていき    jw05989a.m4a    2
ていけい    jw05996a.m4a    3
てん    jw06131a.m4a    2
てんか    jw06138a.m4a    4
てんさい    jw06156a.m4a    2
でんせん    jw06165a.m4a    2
でんとう    jw06173a.m4a    2
と    jw06191a.m4a    2
とう    jw06201a.m4a    4
どう    jw06207a.m4a    3
どうか    jw06223a.m4a    2
とうき    jw06227a.m4a    5
どうき    jw06233a.m4a    2
とうさん    jw06248a.m4a    2
とうじ    jw06251a.m4a    2
どうし    jw06253a.m4a    3
とうしょ    jw06262a.m4a    2
とうじょう    jw06264a.m4a    2
どうじょう    jw06266a.m4a    3
どうせい    jw06275a.m4a    4
とうそう    jw06282a.m4a    2
とうち    jw06289a.m4a    2
とうぶん    jw06312a.m4a    2
どうよう    jw06325a.m4a    2
とく    jw06367a.m4a    4
どく    jw06372a.m4a    2
とくい    jw06374a.m4a    2
とける    jw06406a.m4a    2
とし    jw06420a.m4a    3
とじる    jw06436a.m4a    2
とまる    jw06501a.m4a    2
とめる    jw06506a.m4a    2
とも    jw06508a.m4a    2
とりかえす    jw06540a.m4a    2
とりもどす    jw06560a.m4a    2
とる    jw06566a.m4a    4
とれる    jw06574a.m4a    2
とんとん    jw06587a.m4a    2
どんどん    jw06589a.m4a    2
なおす    jw06620a.m4a    2
なおる    jw06622a.m4a    2
なか    jw06624a.m4a    3
なかなか    jw06637a.m4a    2
なく    jw06659a.m4a    2
なくす    jw06662a.m4a    2
なくなる    jw06664a.m4a    2
なす    jw06681a.m4a    2
なにか    jw06705a.m4a    2
なまり    jw06727a.m4a    2
なみ    jw06729a.m4a    2
なめる    jw06735a.m4a    2
ならう    jw06739a.m4a    2
ならす    jw06741a.m4a    2
なる    jw06749a.m4a    3
に    jw06789a.m4a    3
におい    jw06795a.m4a    2
におう    jw06797a.m4a    2
にち    jw06839a.m4a    3
にっかん    jw06848a.m4a    2
にっちゅう    jw06860a.m4a    2
にる    jw06900a.m4a    2
にんき    jw06910a.m4a    2
ね    jw06944a.m4a    3
ねる    jw07016a.m4a    2
ねんど    jw07035a.m4a    2
ねんとう    jw07037a.m4a    2
のう    jw07050a.m4a    3
のうこう    jw07056a.m4a    2
のぞく    jw07083a.m4a    2
のぞむ    jw07087a.m4a    2
のばす    jw07095a.m4a    2
のびる    jw07101a.m4a    2
のぼる    jw07108a.m4a    3
のり    jw07115a.m4a    2
のる    jw07131a.m4a    2
のろい    jw07134a.m4a    2
は    jw07142a.m4a    3
はい    jw07156a.m4a    2
はえる    jw07213a.m4a    2
はかる    jw07230a.m4a    5
はく    jw07238a.m4a    3
はくし    jw07250a.m4a    2
はげる    jw07277a.m4a    2
はし    jw07287a.m4a    3
はじめ    jw07298a.m4a    2
はち    jw07339a.m4a    3
はっせい    jw07378a.m4a    2
はっそう    jw07380a.m4a    2
はで    jw07403a.m4a    2
はな    jw07410a.m4a    2
はなす    jw07420a.m4a    3
はなれる    jw07435a.m4a    2
はねる    jw07439a.m4a    2
はやい    jw07464a.m4a    2
はら    jw07479a.m4a    2
はる    jw07499a.m4a    3
はれる    jw07507a.m4a    2
はん    jw07510a.m4a    3
ばん    jw07516a.m4a    2
はんえい    jw07521a.m4a    2
はんこう    jw07535a.m4a    2
はんにち    jw07570a.m4a    2
はんめん    jw07586a.m4a    2
はんらん    jw07588a.m4a    2
ひ    jw07592a.m4a    7
ひきのばす    jw07653a.m4a    2
ひく    jw07659a.m4a    3
ひけつ    jw07668a.m4a    2
ひこう    jw07671a.m4a    2
ひしょ    jw07691a.m4a    2
ひと    jw07753a.m4a    2
ひとで    jw07770a.m4a    2
ひとめ    jw07784a.m4a    2
ひなん    jw07797a.m4a    2
ひにん    jw07803a.m4a    2
ひび    jw07812a.m4a    2
ひょう    jw07838a.m4a    3
ひょうし    jw07854a.m4a    2
ひろう    jw07904a.m4a    2
ふか    jw07974a.m4a    2
ふかい    jw07977a.m4a    2
ふきん    jw08001a.m4a    2
ふく    jw08003a.m4a    3
ふくし    jw08010a.m4a    2
ふくしゅう    jw08013a.m4a    2
ふける    jw08029a.m4a    3
ふこう    jw08032a.m4a    2
ふさい    jw08039a.m4a    2
ふじゅん    jw08057a.m4a    2
ふじょ    jw08059a.m4a    2
ふしん    jw08064a.m4a    3
ふじん    jw08067a.m4a    2
ふだん    jw08091a.m4a    2
ふつう    jw08101a.m4a    2
ふへん    jw08150a.m4a    2
ふよう    jw08169a.m4a    2
ふり    jw08184a.m4a    2
ふる    jw08196a.m4a    2
ぶん    jw08227a.m4a    3
ぶんか    jw08232a.m4a    2
へいき    jw08286a.m4a    3
へいこう    jw08290a.m4a    3
ページ    jw08307a.m4a    2
ぺこぺこ    jw08309a.m4a    2
へた    jw08314a.m4a    2
ぺらぺら    jw08332a.m4a    2
へる    jw08336a.m4a    2
へん    jw08340a.m4a    2
へんかん    jw08347a.m4a    2
へんしん    jw08366a.m4a    2
ほう    jw08385a.m4a    3
ほうい    jw08388a.m4a    2
ほうがく    jw08398a.m4a    2
ほうき    jw08402a.m4a    3
ぼうし    jw08417a.m4a    2
ほうじん    jw08423a.m4a    2
ほうそう    jw08432a.m4a    2
ぼうちょう    jw08440a.m4a    2
ぼうとう    jw08446a.m4a    2
ほか    jw08479a.m4a    2
ぼける    jw08499a.m4a    2
ほけん    jw08501a.m4a    2
ほこり    jw08508a.m4a    2
ほしょう    jw08521a.m4a    3
ホット    jw08548a.m4a    2
ほる    jw08581a.m4a    2
ほん    jw08586a.m4a    2
ぼん    jw08587a.m4a    2
まあまあ    jw08627a.m4a    2
まいる    jw08643a.m4a    2
まえ    jw08646a.m4a    2
まく    jw08669a.m4a    4
まざる    jw08691a.m4a    2
まじる    jw08698a.m4a    2
まじわる    jw08700a.m4a    2
まずい    jw08705a.m4a    3
まぜる    jw08712a.m4a    2
また    jw08714a.m4a    2
まつ    jw08741a.m4a    3
まん    jw08818a.m4a    2
み    jw08837a.m4a    2
みかた    jw08861a.m4a    2
ミス    jw08886a.m4a    2
みず    jw08888a.m4a    2
みせ    jw08904a.m4a    2
みち    jw08922a.m4a    2
みる    jw09010a.m4a    3
みんぞく    jw09021a.m4a    2
むく    jw09050a.m4a    2
むける    jw09057a.m4a    2
むこう    jw09061a.m4a    2
むし    jw09068a.m4a    2
むしょく    jw09080a.m4a    2
むち    jw09102a.m4a    2
むり    jw09123a.m4a    2
め    jw09131a.m4a    2
めいあん    jw09136a.m4a    2
めいし    jw09146a.m4a    2
めん    jw09216a.m4a    2
もう    jw09231a.m4a    2
もうける    jw09235a.m4a    2
もち    jw09281a.m4a    2
もつ    jw09290a.m4a    2
もっとも    jw09298a.m4a    2
もと    jw09308a.m4a    2
もの    jw09318a.m4a    2
もも    jw09341a.m4a    2
もる    jw09352a.m4a    2
やかん    jw09370a.m4a    2
やく    jw09378a.m4a    3
やさしい    jw09401a.m4a    2
やとう    jw09429a.m4a    2
やぶる    jw09436a.m4a    2
やぶれる    jw09438a.m4a    2
やむ    jw09445a.m4a    2
やめる    jw09447a.m4a    2
やる    jw09461a.m4a    2
やわらかい    jw09465a.m4a    2
ゆうかん    jw09483a.m4a    2
ゆうこう    jw09490a.m4a    2
ゆき    jw09532a.m4a    2
よ    jw09568a.m4a    2
よう    jw09573a.m4a    3
ようい    jw09576a.m4a    2
ようき    jw09581a.m4a    2
ようご    jw09587a.m4a    3
ようし    jw09593a.m4a    3
ようじ    jw09596a.m4a    2
ようしき    jw09598a.m4a    2
ようしょ    jw09603a.m4a    2
ようしょく    jw09605a.m4a    2
ようせい    jw09612a.m4a    2
ようやく    jw09630a.m4a    2
よく    jw09639a.m4a    3
よける    jw09659a.m4a    2
よだん    jw09692a.m4a    2
よち    jw09694a.m4a    2
よむ    jw09726a.m4a    2
よる    jw09737a.m4a    5
よん    jw09758a.m4a    2
りか    jw09791a.m4a    2
りょう    jw09836a.m4a    5
りょうかい    jw09845a.m4a    2
りょうしん    jw09858a.m4a    2
れい    jw09896a.m4a    3
ろうか    jw09951a.m4a    2
わ    jw09988a.m4a    2
わかい    jw09999a.m4a    2
わかれる    jw10011a.m4a    2
わく    jw10018a.m4a    3
わずらう    jw10035a.m4a    2
わり    jw10059a.m4a    2
わん    jw10083a.m4a    2

Hope it's clearer now.

Last edited by Matthias (2011 January 25, 4:39 pm)

Reply #48 - 2011 January 25, 5:13 pm
Matthias Member
From: Germany Registered: 2005-10-27 Posts: 37

@overture2112: I sent you a mail on the subject, but I can't send a file from the forum.

Therefor I expand the first example for the first hundred audio files:
あう    00027a    会う    また 会いましょう。
あう    00027a    合う    この 靴 は 私 の 足 に 合って いる。
あう    00027a    逢う    ついに 素晴らしい 女性 に あえた。
あう    00027a    遭う    彼 は 交通 事故 に 遭った。
あおぐ    00035a    仰ぐ    私たち は 夜空 を 仰いだ。
あおぐ    00035a    扇ぐ    彼 は 扇子 で 顔 を あおいで いる。
あか    00042a    垢    タオル で 垢 を こすった。
あか    00042a    赤    信号 が 赤 に 変わりました。
あがる    00051a    上がる    今日 は 仕事 が 早く 上がった。
あがる    00051a    上がる    冷めない うち に どうぞ お上がり 下さい。
あがる    00051a    上がる    彼 は 人前 だ と 上がって しまう。
あがる    00051a    上がる    私たち は 2階 に 上がった。
あき    00057a    秋    彼女 は 秋 に 結婚 します。
あき    00057a    空き    部屋 の 空き は あります か。
あき    00057a    開き    この ブラウス は 後ろ開き です。
あき    00057a    飽き    そろそろ 今 の 生活 に 飽き が きて います。
あく    00067a    悪    彼 は 悪 を 憎んで います。
あく    00067a    空く    後ろ の 席 が 空いて います。
あく    00067a    開く    電車 の ドア が 開きました。
あける    00091a    明ける    もうすぐ 夜 が 明ける。
あける    00091a    空ける    彼女 は お年寄り の ため に 席 を 空けた。
あける    00091a    開ける    窓 を 開けて ください。
あげる    00094a    あげる    この 本 あなた に あげます。
あげる    00094a    上げる    彼 は 荷物 を あみだな に 上げた。
あげる    00094a    挙げる    例 を 幾つ か 挙げて みましょう。
あげる    00094a    揚げる    彼女 は 夕食 に 天ぷら を 揚げました。

Let me know if you need more.

Reply #49 - 2011 January 25, 7:53 pm
overture2112 Member
From: New York Registered: 2010-05-16 Posts: 400

Matthias wrote:

@overture2112: I sent you a mail on the subject, but I can't send a file from the forum.

Therefor I expand the first example for the first hundred audio files:
あう    00027a    会う    また 会いましょう。
あう    00027a    合う    この 靴 は 私 の 足 に 合って いる。
あう    00027a    逢う    ついに 素晴らしい 女性 に あえた。
あう    00027a    遭う    彼 は 交通 事故 に 遭った。
あおぐ    00035a    仰ぐ    私たち は 夜空 を 仰いだ。
あおぐ    00035a    扇ぐ    彼 は 扇子 で 顔 を あおいで いる。
あか    00042a    垢    タオル で 垢 を こすった。
あか    00042a    赤    信号 が 赤 に 変わりました。
あがる    00051a    上がる    今日 は 仕事 が 早く 上がった。
あがる    00051a    上がる    冷めない うち に どうぞ お上がり 下さい。
あがる    00051a    上がる    彼 は 人前 だ と 上がって しまう。
あがる    00051a    上がる    私たち は 2階 に 上がった。
あき    00057a    秋    彼女 は 秋 に 結婚 します。
あき    00057a    空き    部屋 の 空き は あります か。
あき    00057a    開き    この ブラウス は 後ろ開き です。
あき    00057a    飽き    そろそろ 今 の 生活 に 飽き が きて います。
あく    00067a    悪    彼 は 悪 を 憎んで います。
あく    00067a    空く    後ろ の 席 が 空いて います。
あく    00067a    開く    電車 の ドア が 開きました。
あける    00091a    明ける    もうすぐ 夜 が 明ける。
あける    00091a    空ける    彼女 は お年寄り の ため に 席 を 空けた。
あける    00091a    開ける    窓 を 開けて ください。
あげる    00094a    あげる    この 本 あなた に あげます。
あげる    00094a    上げる    彼 は 荷物 を あみだな に 上げた。
あげる    00094a    挙げる    例 を 幾つ か 挙げて みましょう。
あげる    00094a    揚げる    彼女 は 夕食 に 天ぷら を 揚げました。

Let me know if you need more.

Aye, that makes the issue quite clear.  I've added a phoneticIndex field to make it easy to sort everything in a spreadsheet now.

So now for each sentence we can determine a list possible audio files, but can we narrow it down closer? Are the sentences in some well known order?  I'm not very good at determining phonetic order manually and the library I used doesn't support kanji, but perhaps someone else can check?

Also, there's still the issue of the word audio often being off by 1-3 (and I confirmed a few cases of it being off by -1, ie, back 1 instead of always forward like I originally observed).

Last edited by overture2112 (2011 January 25, 7:55 pm)

Reply #50 - 2011 January 25, 9:16 pm
Matthias Member
From: Germany Registered: 2005-10-27 Posts: 37

There are 60 sentences which are used multiple times:

Example: きりん の 首 は 長い。

Once for 麒麟 (5331 A) and twice for 首 (44 B and 44 S). That means multiple use can make sense but the 44 S could be removed.