kanji koohii FORUM
Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: The Japanese language (http://forum.koohii.com/forum-10.html)
+--- Thread: Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) (/thread-7423.html)

Pages: 1 2 3


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - finalfantasy6forever - 2011-03-08

Hey guys!

Im so happy that you guys were able to look at the deck. The information you provided is very accurate. Yes my tutor is a Japanese teacher (and a translator IRL). We both took the spreadsheets and manually hand typed in all of the correct kanji for the 4 fields in the deck. All the other original kanji/reading information was left in its original format (to prevent us from getting carpel tunnel syndrome Tongue).

It was a lot of work and took a few months to complete. In the meantime, I was studying RTK while she would do several hundred rows at a time. When she completed it we saved the spreadsheet and converted it to basic anki deck. Optimizing the deck with the menu's makes it much easier.

We hope that this deck helps future users and anti-moon/ajatt sentence method users benefit in the future. If someone would like to "immortalize" the deck by distributing it in a excel google file feel free to do so! We would like the data to be free forever to everyone!

Best of luck to your studies, and hope to see users completing J-J decks! Its a very good feeling jumping in straight Japanese to Japanese, and the sentences are simple!!! Smile

Thank you to everyone in the community! Have a wonderful day!!!!!!!!!!!


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - Nukemarine - 2011-03-08

Thora Wrote:
jettyke Wrote:and does this deck have that thing that 1 new word/expression per card?
Nukemarine shared a google spreadsheet which is apparently sorted per 2001KO and has the Core6000(?) vocab tagged. I don't know whether it was sorted on words or sentences. In any event, that version wasn't completed kanjified which might affect the sort. You could start with that order. Or perhaps Nukemarine could work his magic again with the revised data:-)
For some reason, I'm having problems with Cangy's sorting program. For the RTK video lessons I'm thinking about making, I also wanted to introduce common words that use Kanji upto the point that I taught (KO2k1 pt 1 kanji introduced in RTK order). A sorted list would have been perfect.

Anyway, if it's not working for that, it's probably not going to work with a "kanjified" tanuki list. However, Cangy's program isn't terribly difficult to use. Perhaps someone else can sort the word list with KO2k1 list.

For those asking if it's good to use this post Core 2k, I recommend trying their hand at subs2srs instead. Might be more fun than just doing another vocab list. After one or two shows, try your hand at a list.


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - Katsuo - 2011-03-08

Thora Wrote:-KIC covers all Jouyou kanji (not certain about Tanuki).
Tanuki (corpus), like KiC, is based on the old Joyo kanji list of 1,945 characters and covers all of them (new Joyo list released last year has 2,136 characters). Tanuki and KiC are in the same basic order but KiC adds two non-Joyo characters (nos. 81 誰, & 1,226 賂) which changes the numbering a little and gives a total of 1,947.

I think this data would be even better if the explanation phrases were completely transcribed into kana. In many cases the phrases contain the target kanji but its reading isn't given. E.g. 名
Example sentence (kanji) ぼくの名字は、「山本」です。
Example sentence (kana) ぼくの名じは、「やまもと」です。
Explanation phrase (kanji) 家の呼び名
Explanation phrase (kana) いえのよび名
Word (kanji) 名字
Word (kana) みょうじ

So in the explanation phrase the reading of 名 (な) isn't specified.


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - Thora - 2011-03-08

Oh right. The "new jouyou" is the new "jouyou". Thanks for clarifying that.

So Tanuki was in KIC order. Odd that it used such a different set of words.
(I'm wondering if I had a different version of Tanuki. I don't recall any numbering or ordering by kanji. The words were all like this too: みょう字 名(じ) 

@Nukemarine. Thanks. Didn't mean to put you on the spot like that. :-)
I don't really see this as something to jump into after core2000, either. At least not all of it. And certainly not as the main meal. This is pure kanji vocab learning and nothing else.


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - jettyke - 2011-03-08

Thora Wrote:I don't really see this as something to jump into after core2000, either. At least not all of it. And certainly not as the main meal. This is pure kanji vocab learning and nothing else.
But how about "something to jump into after core6k ?"
Right now I'm thinking about what to do after I finish 6k.

My choices are:

1)Find that incomplete weird core 10k deck. And somehow magically delete all cards that were in 6k and do it.

2) Go straight to reading. I'll maybe start reading junior high school level novels and will add cards with yomichan, and use no lists anymore

3) Do this deck here

4) Do the BIG core plus deck's leftover JLPT words that were not in core. (maybe ~5 k words)


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - pm215 - 2011-03-08

jettyke Wrote:But how about "something to jump into after core6k ?"
Right now I'm thinking about what to do after I finish 6k.
If you've done core6k then you definitely have enough vocab. Start actually doing some reading (and also grammar if you haven't done much in that direction), but you've got easily enough vocab to get started with.

I still haven't finished working through core6k :-)


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - radical_tyro - 2011-03-08

pm215 Wrote:
jettyke Wrote:But how about "something to jump into after core6k ?"
Right now I'm thinking about what to do after I finish 6k.
If you've done core6k then you definitely have enough vocab. Start actually doing some reading (and also grammar if you haven't done much in that direction), but you've got easily enough vocab to get started with.

I still haven't finished working through core6k :-)
i finished core6k and i don't find it to be 'enough' in the sense that whenever i read native material i come across loads of new words. this is one way of increasing vocab but it can be time intensive. i'm also considering the other options jettyke mentioned, with a personal preference for clearing out the JLPT lists so i have a better shot at N1.


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - juniperpansy - 2011-03-08

Thanks for this ff6. It is greatly appreciated Smile
I am 2700 sentences into the core series. This deck looks like it might be a little tough for me, but I will definitely utilize it to the best of my ability.
Thanks again Smile

edit: Has anybody been able to improve the card layout for this deck? I think it should be possible to diplay the kana readings above the kanji in the answer but I can't seem to do it. Any ideas? Smile


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - animehunter123 - 2011-03-09

Hi Juniper,

Yes this deck is amazing! I was able do display the kana readings above by exporting it to a text file, and using a sed/vim command to enclose the kana readings like so:

example:
漢字コーヒーが超大好きだぜ!
original kana reading: かんじこーひーがちょうだいすきだぜ!
new kana should be: [かんじこーひーがちょうだいすきだぜ!]


With the new brackets, and the field renamed to "Reading" just like the basic Japanese layout in Anki you should be able to have the hiragana like so Smile

Best of luck!


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - radical_tyro - 2011-03-09

this is an interesting deck. thanks for sharing!


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - pm215 - 2011-03-09

radical_tyro Wrote:
pm215 Wrote:If you've done core6k then you definitely have enough vocab. Start actually doing some reading (and also grammar if you haven't done much in that direction), but you've got easily enough vocab to get started with.

I still haven't finished working through core6k :-)
i finished core6k and i don't find it to be 'enough' in the sense that whenever i read native material i come across loads of new words.
Well, sure, but the point is that you start reading and build up your vocab in the background. If you've learned 6000+ words but you're still not reading "junior high school level novels" yet then something is weirdly out of balance, and working through another enormous vocab list is not really the answer IMHO.


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - nest0r - 2011-03-13

As per recent comments in the ‘What's this word/phrase?’ thread, for some cards the Vocabulary Word (Kanji) field gives the kana reading in parentheses, and sometimes those readings are incorrect. This is not an artefact from the original Tanuki corpus. Also, sometimes the parenthesized element is kanjification rather than kana.

Example: 一畝 has the reading ひとうね parenthesized beside it in the Vocabulary Word (Kanji) field, but the actual reading and meaning given in other fields is いっせ. The せ here is important to know because in definitions it's the reading for the (obsolete?) unit of measurement, as opposed to うね for ‘ridge’. The partial hints, example sentences, and definitions make it clear (despite the use of the English keyword ‘ridge’) that せ/unit of measurement is the correct one.


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - nest0r - 2011-03-13

This regex (UltraEdit, the Perl option checked) seems to correctly convert the symbols in the original Tanuki text (previously linked) to ruby markup for the browser (Firefox + Ruby plugin).

Find: ▼*([\x{4e00}-\x{9fa5}]{0,1})@([\x{3040}-\x{30FF}]+)▲
Replace: <ruby><rb>\1</rb><rp>@</rp><rt>\2</rt><rp>▲</rp></ruby>


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - nest0r - 2011-03-15

Oh! I just experimented and found that Anki supports ruby! You stick the markup in the area with stuff like {{Front}} in it. I tested it with cards where I had the expression and reading in separate fields, then used the expression field {{Kanji}} as the base in the ruby markup, etc. There must be a way to integrate this with Tanuki using above find/replace, so that the examples/words are rubified, no?

Only problem is the ruby is sized improperly if you use the reading field as the furigana, not sure how to adjust that.

Edit 1:

Tinkering with the ruby size via webkit CSS (using the plugin) worked but not for when using the fields as input, i.e. Anki's field font sizing overrides the ruby CSS, so the best workaround seems to be to halve the size of the reading field. That works fine since the reading field would be purely used for the ruby. (So perhaps better label it something more specifically related to the ruby markup.)

Of course, this has now become somewhat tangential to the Tanuki markup. But at least now we know we can customize furigana a bit more than Anki's usual furigana generation (I think).

Edit 1.1: For instance, you can have a different colour as well. And of course, make whatever fields you want be the ruby base/text.

Edit 2: One tentative usage for something like Tanuki, perhaps, could be to use its built-in ruby markup in a text to create delimiters for import into Anki, mapping the respective ruby base and ruby text fields, and then making sure the Front and Back of the card has relevant ruby markup factoring in those two fields.

Edit 3: Ruby markup is basically:

<ruby><rb>{{Kanji}}</rb><rt>{{Kana}}</rt></ruby> - This simple form works for above purposes, though usage of more markup such as <rp> could work.


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - nest0r - 2011-03-19

So I've been using this deck to experiment with in terms of layouts. In case anyone was wondering, I decided it was too much trouble to convert the rubi from the original Tanuki to tab-delimited fields for import into Anki (in order to use the ruby markup in the layout), because the regex was more than I knew how to do (would've needed fields for kanji/kana in pre and post ruby areas, consecutive numbering for multiple ruby base/text fields, etc.)

However, since some of the fields are just single words, and I've set up this Tanuki deck as a vocabulary deck (added an audio field for Third's JDIC plugin, also), I decided to go ahead and rubify what is in the deck, focusing mostly on the words.

Here's the current setup (Tools→Deck Properties→Edit→Card Layout):

Question: {{Vocabulary Word (Kanji)}}

Answer:

<ruby><rb>{{Vocabulary Word (Kanji)}}</rb><rt>{{Vocabulary Word (Kana)}}</rt></ruby>
<p>
<ruby><rb>{{Japanese - Definition (Kanji)}}</rb><rt>{{Japanese - Definition (Kana)}}</rt></ruby>
<p><ruby><rb>{{Example Sentence (Kanji)}}</rb><rt>{{Example Sentence (Kana)}}</rt></ruby>
{{Audio}}

Each of those (Kanji) fields are set up with size 30 font; the (Kana) fields size 18—normally <ruby> automatically sets the furigana at 50-60% the size of the kanji beneath it, but Anki's fields override that so you customize their font/color as you normally would for Anki's fields.

As you can see, I went ahead and rubified those paired fields beyond Vocabulary Word. It's not as neat especially as they get longer, since it's not divided per character; perhaps someone more tech savvy than I can start incorporating ruby better in Anki.

Edit: Oh, and I did a Find/Replace in the Anki browser (targeting the Vocabulary Word (Kanji) field) to get rid of the parenthetical stuff. I think it was Find: (*$ w/ regex enabled and replaced w/ blank? I forget.


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - nest0r - 2011-03-19

Okay, having tinkered with the Furigana plugin, for this deck it might be easier to just use it instead. (The above ruby markup was more a proof-of-concept, but I do think it's better to use the above in general since you don't have to edit fields because the markup's in the layout not the field, and you can have different coloured kanji and ruby, etc.)

I took the original Tanuki text and imported it into Anki as a 29 field deck with the fields named after the appropriate headings in the original Tanuki text, then got rid of all of the ▼ using Find/Replace in the card browser (after optimizing database), then replaced the @ and ▲ in all fields with [ and ], respectively. Edit: Replace ▼ with a space instead of just blanking it out.

Since only one MeaningDistractor field is correct and I had a bit of a brain glitch, I searched in the browser for: ‘MeaningAnswer:1’, selected all those cards, and Find/Replaced (Find: .*$, enabled regex, left Replace blank) for the MeaningDistractor2 fields (all 3 of them), then did the same with the cards that came up in the search ‘MeaningAnswer:2’, blanking their their MeaningDistractor1 fields. That way I could display both in the Card Layout section without having to worry abot displaying the wrong distractor (I display the Distractor rather than CorrectAnswer fields in the layout because they have the rubi markup that was converted to bracketed furigana). I'm sure there's a better way to do that so there's no empty fields in the cards.

Oh, and I found I needed to edit the furigana.py to make sure the ruby alignment was Right instead of Center (and set FURI_OUTSIDE_READING = True so all fields will display furigana).

Play with the layout how you like and voila, you now have Tanuki with rubified Words, Meanings, and Sentences. It's not Ultima, but surely the fields can be combined.

Only problem now is that the furigana plugin seems to display the hover/highlight/tooltip thing on non-Question fields i.e. all fields w/ readings, and the lack of markup to tell the furigana which preceding kanji is the boundary, which might be throwing off the spacing for some of the compounds. Edit: Ahha, the furigana plugin uses a space preceding the base kanji, so replacing the ▼ with a space might work.

Edit: Anki's Update button is nice. You could probably add the three fields in the original Tanuki that have the rubi markup.

Final edit: Actually I think ruby (either with the markup or the plugin) looks best without individually doing each reading, so in that sense the initial mod to tanuki-ultima actually looks better, and works better for selecting the text.


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - Tori-kun - 2011-03-19

Thora Wrote:Nukemarine shared a google spreadsheet which is apparently sorted per 2001KO and has the Core6000(?) vocab tagged.
Just wanted to ask for the link to that particular spreadsheet; have a few links, but i'm not sure if these are the right ones, hm. Also love the name, tanuki @Thora Big Grin


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - Irixmark - 2011-03-19

jettyke Wrote:1)Find that incomplete weird core 10k deck. And somehow magically delete all cards that were in 6k and do it.
We're still working on that incomplete weird core 10k deck, but could use a bit of help because it is taking ages.

http://forum.koohii.com/showthread.php?tid=7104&page=3

Deleting all cards that were in 6k is easy because the spreadsheet has a column with the Core number. Magic, isn't it?


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - Thora - 2011-03-19

Tori-kun Wrote:Just wanted to ask for the link to that particular spreadsheet; have a few links, but i'm not sure if these are the right ones, hm.
This post has a link to the (pre-revised) Tanuki list sorted by vocab.


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - Nihonnub - 2011-03-25

I'd like to chime in and say thanks for your hard work on this deck! Looking forward to trying it out Big Grin


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - Shakunatz - 2011-04-14

I haven't read the whole thread but I happened to find a mistake in the 848 sentence.
I think that sentence should be written as 「弾が敵を外れてしまった。」 and not as 「弾が的を外れてしまった。」.


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - Thora - 2011-04-14

The intended meaning might have been 的 (まと) "miss the target" which is okay.

edit: btw, I looked at ~100 cards and noticed that there may be a few more possible "kanjifications" (for consistency or words typically in kanji.) Just something to keep an eye out for.


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - nest0r - 2011-06-08

Something fun: If you're using overture's morphology plugin, use the Japanese (kanji) definition in this deck as your Expression field to set your iPlusN/unknowns (just hit F2 and rename the field, then rename it back when you're done) and sort by iPlusN in the card browser to unsuspend and review the cards with the easiest definitions first to get a smoother monolingual experience.


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - animehunter123 - 2011-06-30

Does anyone know how to do an advanced regedit that highlights in PURPLE, underlines and bolds all of the vocab words in this deck?

for example if you export it from anki into a facts tab delimited file you get:

借りる かりる 返す約束で人の物を使う。 かえすやくそくでひとのものをつかう。 兄から帽子を借りる。 あにからぼうしを借りる。 借 しゃく か(りる) borrow,rent 訓


But I want to change it into the below (and for multiple times per line if the word is multiple times) to:

<span style="font-weight:600; text-decoration: underline; color:#e418ff;">借りる</span> かりる 返す約束で人の物を使う。 かえすやくそくでひとのものをつかう。 兄から帽子を借りる。 あにからぼうしを<span style="font-weight:600; text-decoration: underline; color:#e418ff;">借りる</span>。 借 しゃく か(>



Anyone know how to do this? ive been hitting my head against the wall for a few hours trying to nail this down. Here is the closest with VIM or SED:

insert a ';' after the vocab word...
sed 's#\t#;\t#2' tanukiFACTS.txt > tanukiFACTS.txt-v2

use vim to.... painfully and (not entirely working) this command:

:%s%^\(...^I\)\(.*\);\(.*\)\2\(.*\)%\1<span style="font-weight:600; text-decoration: underline; color:#e418ff;">\2</span>\3<span style="font-weight:600; text-decoration: underline; color:#e418ff;">\2</span>\4



urghghhlll.... help... XD


Tanuki-Ultima (Anki Deck Released! Has 6,200 Sentence J<->J;) - wccrawford - 2011-06-30

I don't know of any way to make the search change for each line like that. You're actually doing a double search, first to determine the word, then to find the instances of the word in the line.

Instead, I would write up a quick script (php, perl, ruby, etc) and process it line by line instead.