![]() |
|
"Are there kanji where 貝 appears on the right?" IDSgrep to the rescue - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: The Japanese language (http://forum.koohii.com/forum-10.html) +--- Thread: "Are there kanji where 貝 appears on the right?" IDSgrep to the rescue (/thread-12093.html) |
"Are there kanji where 貝 appears on the right?" IDSgrep to the rescue - aldebrn - 2014-08-20 §1. Intro I recently read about IDSgrep on the KanjiVG mailing list, and after installing it, I got to use it right away during today's Anki reviews. I thought I'd give a mini-review/howto for IDSgrep since the only other advertising this fancy tool has is as part of the tool's author's presentation on the broader Tsukurimashou font project at TUG (very kawaii slides online). (There are some unusual Unicode characters in what follows, but they all seem to render fine in all the browsers/OSes I've tested.) §2. 賠 In my review, I stumbled on 賠, mixing up the left and right. Then I wondered, are there any kanji where 貝 appears on the right of a left/right unit? Let's ask IDSgrep: $ idsgrep -C -dk '[lr](anything)貝' 【唄】⿰口<貝>⿱目? 【狽】⿰⺨<貝>⿱目? Aha, I knew about 唄 where it makes sense, 口 being one of the few kanji that's more "primitive" than 貝, but that's it. (I'll get to 狽 in RTK3, and will deal with this exception to the rule then .)How about to the left? Yes there are quite a few: $ idsgrep -C -dk '[lr]貝(anything)' 【則】⿰<貝>⿱目?刂 【戝】⿰<貝>⿱目?戈 【敗】⿰<貝>⿱目?<攴>⿱?乂 【財】⿰<貝>⿱目?才 【販】⿰<貝>⿱目?反 【貯】⿰<貝>⿱目?⿱宀丁 【貶】⿰<貝>⿱目?<乏>⿱丿之 【貼】⿰<貝>⿱目?<占>⿱卜口 【貽】⿰<貝>⿱目?<台>⿱厶口 【賂】⿰<貝>⿱目?<各>⿱夂口 【賊】⿰<貝>⿱目?戎 【賍】⿰<貝>⿱目?<庄>⿸广土 【賎】⿰<貝>⿱目?戔 【賑】⿰<貝>⿱目?辰 【賜】⿰<貝>⿱目?<易>⿱日勿 【賠】⿰<貝>⿱目?⿱<立>⿱亠?口 【賤】⿰<貝>⿱目?<戔>⿱戈戈 【賦】⿰<貝>⿱目?武 【賭】⿰<貝>⿱目?者 【賺】⿰<貝>⿱目?兼 【賻】⿰<貝>⿱目?⿱甫寸 【購】⿰<貝>⿱目?<冓>⿱三再 【贈】⿰<貝>⿱目?<曽>⿱?日 【贍】⿰<貝>⿱目?? 【贐】⿰<貝>⿱目?<盡>⿱⿱聿灬皿 【贓】⿰<貝>⿱目?臧 【贖】⿰<貝>⿱目?<賣>⿱?<貝>⿱目? 【鵙】⿰<貝>⿱目?鳥 §3. Explanation By now you've realized that IDSgrep is a command-line tool, and have forgiven it for being so . The "$" at the beginning in my command prompt, so whatever follows that is the actual command I ran.The "k" in "-dk" tells it to search KanjiVG, which has about 6600 kanji. So KanjiVG has two characters with 貝 to the right, 28 to the left. (Using the combined Chinese-Japanese-Korean CJKVI database, those numbers expand to 24 (right) versus 276 (left), several of which don't print on any font in Mac OS X including Noto Sans Chinese, Google-Adobe's new font .) The "-C" makes it print in pretty colors in my terminal, sorry I don't know how to translate that to the forum.The EIDS syntax of representing these takes a little getting used to. (There is an academic paper with a formal description if you can handle that. The IDSgrep manual has a simpler explanation in the "Technical details" section (page 17), though the Quick-start guide (page 3) is useful too.) Basically, characters that look like ⿰⿱⿴⿵⿶⿷⿸⿹⿺⿻⿲⿳ are actual Unicode Ideographic Description Characters meant to show binary or ternary (threesome) groupings in Chinese characters. So characters following one of those are put in the appropriate slot. <> around something means it's a kanji that can be broken down itself, and its EIDS tree follows. Question marks indicate radicals/characters that either don't have a Unicode point associated with them or that the database chose not to represent as such. KanjiVG chose to represent the little 八 in 貝 as paths without identifying it as 八, like CJKVI does, hence the many question marks in the above. §4. 庭 Later in my review I flubbed 庭, omitting the 廴 radical (haste makes waste...). Before we can ask if 壬 ever appear inside a wrapper radical, first, how does IDSgrep represent that character? Let's look it up in the dictionary: $ idsgrep -C -d 庭 :cjkvi-j.eids:【庭】⿸广<廷>⿺廴壬 :edict.eids:【庭】,<庭>⿸广<廷>⿺廴壬 ([にわ] (n\) (1\) garden/yard/courtyard/(2\) field (of action\)/area/(P\)) :kanjivg.eids:【庭】⿸广<廷>⿺廴壬 For 庭, the major grouping is ⿸. So now we can ask if 壬 ever appear inside a wrapper radical: $ idsgrep -C -d '⿸(anything)壬' :cjkvi-j.eids:【\x{221E6}】⿸广壬 :cjkvi-j.eids:【\x{2B29F}】⿸<虍>⿸⿱⺊<CDP-88E2>;<七>⿻<CDP-8DE4>;乚壬 The answer is yes but only in two Chinese-only characters obscure enough for my terminal font not to display them ( evil elitist terminal font). I won't make "Never put 壬 inside a wrapper radical" a rule, because it's too specific, but hopefully the exercise helped my subconscious "wrong kanji recognition" sense. §5. 職 Finally, I was just curious about the EIDS representation of 職, one of my favorite kanji: $ idsgrep -C -dc 職 【職】⿰耳<戠>⿰<音>⿱<立>⿱<CDP-8BAE>⿱<亠>⿱丨一丷一日<戈>⿻弋丿 (I'm using the CJKVI decomposition because KanjiVG's has only two elements---actually it's not that KanjiVG's representation of this is incomplete, rather, it's nesting sequence couldn't be automatically determined by IDSgrep.) That's pretty detailed. IDSgrep can indent the tree for us: $ idsgrep -C -c indent -dc 職 【職】⿰ 耳 <戠>⿰ <音>⿱ <立>⿱ <CDP-8BAE>⿱ <亠>⿱ 丨 一 丷 一 日 <戈>⿻ 弋 丿 So the breakdown is pretty straightforward, if detailed: mostly left/right and top/bottom splits. Interesting that 戠 was simple left/right, even though the last stroke of 立 extends to become the first stroke of 戈. For this kind of decompositional searching, this coarse level of detail, omitting things like stroke-sharing, is the right level. KanjiVG's representation of this kanji will be complicated because it represents the sharing of the 一 between two radicals (with another, 曰, in between them), which is presumably why IDSgrep can't automatically parse it. I'm not quite sure what that CDP-8BAE exactly is but GlyphWiki has an entry for it: it's apparently the first four strokes of 立. (GlyphWiki seems to be one of the several Japanese-only projects, like CJKVI and CHISE, that are analogous to KanjiVG, which seems mainly a gaijin production. I often come across projects and movements and subcultures in different countries that are often very similar but that aren't aware of each other, or in this case, don't readily cross-pollinate, due to language barriers, and it always reaffirms my commitment to multilingualism.) And this gives us an example of the "⿻" relationship: if it could be treated as a function, ⿻(弋,丿) = 戈. §6. Installation IDSgrep is the result of one Linux hacker, Matthew Skala, so it's not as easy to install as some of the tools discussed on Koohii. Also, I confirmed with the author that there is a bug in the build instructions. But I was able to build it fine in Ubuntu and Mac OS X (with Xcode command line tools and Homebrew installed). In Windows, I imagine it's a snap with Cygwin but haven't tried (if you're a Windows user and wants a binary, let me know and I can try building one for you---though I'm not sure if you can find a terminal emulator that'll handle Japanese I/O). For Linux/Mac OS X: 1) Download https://sourceforge.jp/projects/tsukurimashou/releases/60839]idsgrep-0.5.1.tar.gz. 2) In the terminal, `tar xzf idsgrep-0.5.1.tar.gz` to extract it. 3) Download and place edict.gz and the latest KanjiVG XML in the newly-created idsgrep-0.5.1 directory. 4) Back in the terminal, `cd idsgrep-0.5.1; ./configure --disable-docs && make` to enter the directory, configure the Makefile, and then actually build the program. Hopefully it'll just warn you about missing components and such, not actually error. 4a) The configure script will warn you if you're missing PCRE (for regular searching) and Buddy (makes IDSgrep faster). If you have Homebrew, you can `brew install pcre` before running the above to have extra goodness in IDSgrep; Ubuntu users should already have this. Buddy, despite having a terrible name, was easy to build and install on Mac & Unbuntu. But I think IDSgrep will work fine without these two. 4b) `sudo make install` if you want to copy the executable to the system, otherwise you can just run it via `./idsgrep` in the current directory. I think it'll be a useful tool in my kanji learning experience. Perhaps not by providing actual actionable information that improves my kanji performance (though that would be nice), but by keeping it interesting and feeding my curiosity about the crazy, wonderful world of kanji. Appendix I ran IDSgrep in dictionary mode (using CJKVI) on 3028 kanji in RTK volumes 1 & 3, and put the resulting files at http://fasiha.github.io/IDSJoyoPlus/ --- note that clicking on the primary kanji on each line links to the indented tree-like version of the decomposition for that kanji. This is meant to be a quick reference for whenever I don't have IDSgrep available or just want to look at breakdowns wholesale. "Are there kanji where 貝 appears on the right?" IDSgrep to the rescue - aldebrn - 2014-11-19 Belated update: I compiled this tool to Javascript using Emscripten so you can play with it in your browser: http://fasiha.github.io/idsgrep-emscripten/ You can run queries against the KanjiVG and CJKVI dictionaries, but please wait till they download (several megs) before you submit a query! "Are there kanji where 貝 appears on the right?" IDSgrep to the rescue - Inny Jan - 2014-11-19 aldebrn Wrote:In my review, I stumbled on 賠, mixing up the left and right.The issue you are talking about motivated me to formulate my stories in such a way that they include hints as for the placement of primitives. Usually the hint is some movement or action word that indicates direction. In case of 賠, I have: money are given to... implying the movement from left (the original placement of 貝) to right. If that was the other way around my story would likely have something like: money are taken from..., shellfish leave..., or something else to that effect. |