Joined: Feb 2007
Posts: 1,558
Thanks:
0
Oh right. The "new jouyou" is the new "jouyou". Thanks for clarifying that.
So Tanuki was in KIC order. Odd that it used such a different set of words.
(I'm wondering if I had a different version of Tanuki. I don't recall any numbering or ordering by kanji. The words were all like this too: みょう字 名(じ)
@Nukemarine. Thanks. Didn't mean to put you on the spot like that. :-)
I don't really see this as something to jump into after core2000, either. At least not all of it. And certainly not as the main meal. This is pure kanji vocab learning and nothing else.
Joined: Nov 2005
Posts: 269
Thanks:
0
this is an interesting deck. thanks for sharing!
Joined: Oct 2007
Posts: 4,582
Thanks:
0
As per recent comments in the ‘What's this word/phrase?’ thread, for some cards the Vocabulary Word (Kanji) field gives the kana reading in parentheses, and sometimes those readings are incorrect. This is not an artefact from the original Tanuki corpus. Also, sometimes the parenthesized element is kanjification rather than kana.
Example: 一畝 has the reading ひとうね parenthesized beside it in the Vocabulary Word (Kanji) field, but the actual reading and meaning given in other fields is いっせ. The せ here is important to know because in definitions it's the reading for the (obsolete?) unit of measurement, as opposed to うね for ‘ridge’. The partial hints, example sentences, and definitions make it clear (despite the use of the English keyword ‘ridge’) that せ/unit of measurement is the correct one.
Edited: 2011-03-13, 6:56 pm
Joined: Oct 2007
Posts: 4,582
Thanks:
0
This regex (UltraEdit, the Perl option checked) seems to correctly convert the symbols in the original Tanuki text (previously linked) to ruby markup for the browser (Firefox + Ruby plugin).
Find: ▼*([\x{4e00}-\x{9fa5}]{0,1})@([\x{3040}-\x{30FF}]+)▲
Replace: <ruby><rb>\1</rb><rp>@</rp><rt>\2</rt><rp>▲</rp></ruby>
Joined: Oct 2007
Posts: 4,582
Thanks:
0
Oh! I just experimented and found that Anki supports ruby! You stick the markup in the area with stuff like {{Front}} in it. I tested it with cards where I had the expression and reading in separate fields, then used the expression field {{Kanji}} as the base in the ruby markup, etc. There must be a way to integrate this with Tanuki using above find/replace, so that the examples/words are rubified, no?
Only problem is the ruby is sized improperly if you use the reading field as the furigana, not sure how to adjust that.
Edit 1:
Tinkering with the ruby size via webkit CSS (using the plugin) worked but not for when using the fields as input, i.e. Anki's field font sizing overrides the ruby CSS, so the best workaround seems to be to halve the size of the reading field. That works fine since the reading field would be purely used for the ruby. (So perhaps better label it something more specifically related to the ruby markup.)
Of course, this has now become somewhat tangential to the Tanuki markup. But at least now we know we can customize furigana a bit more than Anki's usual furigana generation (I think).
Edit 1.1: For instance, you can have a different colour as well. And of course, make whatever fields you want be the ruby base/text.
Edit 2: One tentative usage for something like Tanuki, perhaps, could be to use its built-in ruby markup in a text to create delimiters for import into Anki, mapping the respective ruby base and ruby text fields, and then making sure the Front and Back of the card has relevant ruby markup factoring in those two fields.
Edit 3: Ruby markup is basically:
<ruby><rb>{{Kanji}}</rb><rt>{{Kana}}</rt></ruby> - This simple form works for above purposes, though usage of more markup such as <rp> could work.
Edited: 2011-03-15, 6:08 am
Joined: Oct 2007
Posts: 4,582
Thanks:
0
So I've been using this deck to experiment with in terms of layouts. In case anyone was wondering, I decided it was too much trouble to convert the rubi from the original Tanuki to tab-delimited fields for import into Anki (in order to use the ruby markup in the layout), because the regex was more than I knew how to do (would've needed fields for kanji/kana in pre and post ruby areas, consecutive numbering for multiple ruby base/text fields, etc.)
However, since some of the fields are just single words, and I've set up this Tanuki deck as a vocabulary deck (added an audio field for Third's JDIC plugin, also), I decided to go ahead and rubify what is in the deck, focusing mostly on the words.
Here's the current setup (Tools→Deck Properties→Edit→Card Layout):
Question: {{Vocabulary Word (Kanji)}}
Answer:
<ruby><rb>{{Vocabulary Word (Kanji)}}</rb><rt>{{Vocabulary Word (Kana)}}</rt></ruby>
<p>
<ruby><rb>{{Japanese - Definition (Kanji)}}</rb><rt>{{Japanese - Definition (Kana)}}</rt></ruby>
<p><ruby><rb>{{Example Sentence (Kanji)}}</rb><rt>{{Example Sentence (Kana)}}</rt></ruby>
{{Audio}}
Each of those (Kanji) fields are set up with size 30 font; the (Kana) fields size 18—normally <ruby> automatically sets the furigana at 50-60% the size of the kanji beneath it, but Anki's fields override that so you customize their font/color as you normally would for Anki's fields.
As you can see, I went ahead and rubified those paired fields beyond Vocabulary Word. It's not as neat especially as they get longer, since it's not divided per character; perhaps someone more tech savvy than I can start incorporating ruby better in Anki.
Edit: Oh, and I did a Find/Replace in the Anki browser (targeting the Vocabulary Word (Kanji) field) to get rid of the parenthetical stuff. I think it was Find: (*$ w/ regex enabled and replaced w/ blank? I forget.
Edited: 2011-03-19, 2:52 am
Joined: Oct 2007
Posts: 4,582
Thanks:
0
Okay, having tinkered with the Furigana plugin, for this deck it might be easier to just use it instead. (The above ruby markup was more a proof-of-concept, but I do think it's better to use the above in general since you don't have to edit fields because the markup's in the layout not the field, and you can have different coloured kanji and ruby, etc.)
I took the original Tanuki text and imported it into Anki as a 29 field deck with the fields named after the appropriate headings in the original Tanuki text, then got rid of all of the ▼ using Find/Replace in the card browser (after optimizing database), then replaced the @ and ▲ in all fields with [ and ], respectively. Edit: Replace ▼ with a space instead of just blanking it out.
Since only one MeaningDistractor field is correct and I had a bit of a brain glitch, I searched in the browser for: ‘MeaningAnswer:1’, selected all those cards, and Find/Replaced (Find: .*$, enabled regex, left Replace blank) for the MeaningDistractor2 fields (all 3 of them), then did the same with the cards that came up in the search ‘MeaningAnswer:2’, blanking their their MeaningDistractor1 fields. That way I could display both in the Card Layout section without having to worry abot displaying the wrong distractor (I display the Distractor rather than CorrectAnswer fields in the layout because they have the rubi markup that was converted to bracketed furigana). I'm sure there's a better way to do that so there's no empty fields in the cards.
Oh, and I found I needed to edit the furigana.py to make sure the ruby alignment was Right instead of Center (and set FURI_OUTSIDE_READING = True so all fields will display furigana).
Play with the layout how you like and voila, you now have Tanuki with rubified Words, Meanings, and Sentences. It's not Ultima, but surely the fields can be combined.
Only problem now is that the furigana plugin seems to display the hover/highlight/tooltip thing on non-Question fields i.e. all fields w/ readings, and the lack of markup to tell the furigana which preceding kanji is the boundary, which might be throwing off the spacing for some of the compounds. Edit: Ahha, the furigana plugin uses a space preceding the base kanji, so replacing the ▼ with a space might work.
Edit: Anki's Update button is nice. You could probably add the three fields in the original Tanuki that have the rubi markup.
Final edit: Actually I think ruby (either with the markup or the plugin) looks best without individually doing each reading, so in that sense the initial mod to tanuki-ultima actually looks better, and works better for selecting the text.
Edited: 2011-03-20, 12:40 am
Joined: Aug 2009
Posts: 96
Thanks:
1
I haven't read the whole thread but I happened to find a mistake in the 848 sentence.
I think that sentence should be written as 「弾が敵を外れてしまった。」 and not as 「弾が的を外れてしまった。」.
Joined: Feb 2007
Posts: 1,558
Thanks:
0
The intended meaning might have been 的 (まと) "miss the target" which is okay.
edit: btw, I looked at ~100 cards and noticed that there may be a few more possible "kanjifications" (for consistency or words typically in kanji.) Just something to keep an eye out for.
Edited: 2011-04-14, 5:51 am
Joined: Oct 2007
Posts: 4,582
Thanks:
0
Something fun: If you're using overture's morphology plugin, use the Japanese (kanji) definition in this deck as your Expression field to set your iPlusN/unknowns (just hit F2 and rename the field, then rename it back when you're done) and sort by iPlusN in the card browser to unsuspend and review the cards with the easiest definitions first to get a smoother monolingual experience.
Edited: 2011-06-08, 3:12 pm
Joined: Dec 2010
Posts: 211
Thanks:
0
Does anyone know how to do an advanced regedit that highlights in PURPLE, underlines and bolds all of the vocab words in this deck?
for example if you export it from anki into a facts tab delimited file you get:
借りる かりる 返す約束で人の物を使う。 かえすやくそくでひとのものをつかう。 兄から帽子を借りる。 あにからぼうしを借りる。 借 しゃく か(りる) borrow,rent 訓
But I want to change it into the below (and for multiple times per line if the word is multiple times) to:
<span style="font-weight:600; text-decoration: underline; color:#e418ff;">借りる</span> かりる 返す約束で人の物を使う。 かえすやくそくでひとのものをつかう。 兄から帽子を借りる。 あにからぼうしを<span style="font-weight:600; text-decoration: underline; color:#e418ff;">借りる</span>。 借 しゃく か(>
Anyone know how to do this? ive been hitting my head against the wall for a few hours trying to nail this down. Here is the closest with VIM or SED:
insert a ';' after the vocab word...
sed 's#\t#;\t#2' tanukiFACTS.txt > tanukiFACTS.txt-v2
use vim to.... painfully and (not entirely working) this command:
:%s%^\(...^I\)\(.*\);\(.*\)\2\(.*\)%\1<span style="font-weight:600; text-decoration: underline; color:#e418ff;">\2</span>\3<span style="font-weight:600; text-decoration: underline; color:#e418ff;">\2</span>\4
urghghhlll.... help... XD
Joined: Mar 2008
Posts: 1,533
Thanks:
0
I don't know of any way to make the search change for each line like that. You're actually doing a double search, first to determine the word, then to find the instances of the word in the line.
Instead, I would write up a quick script (php, perl, ruby, etc) and process it line by line instead.
Edited: 2011-06-30, 6:21 am