Back

To RTK 2 or not to RTK2?

#26
woelpad Wrote:As radical_tyro just demonstrated, it's more difficult than it looks. The choice of signal primitives, the categorizing into pure, semi-pure and mixed, the selection of compounds, even the selection of meanings (one or two per compound), all that takes a substantial effort. Suppose you'd be given only the method and the set of kanji to operate on, there'd be little chance that you would make the same choices and create the same work.
I was just considering the "pure groups".

Granted, there is no escaping manually sorting through the data.

radical_tyro How about starting by grouping characters by reading first? In Trinity's On Yomi page for example clicking on the reading チュウ gives 中注柱虫忠仲昼抽宙衷駐. Manually trim out the kanji that don't fit the visual similarity. There is a finite number of such readings, 350+.

woelpad By the way, there's something to be said about freely usable data. Jim Breen's KANJIDIC, JMDICT, etc. have spawned countless useful programs. If someone spends time creating a list of "pure groups", which does not limit itself to RtK1 but instead to JLPT1 or KanjiKentei level 1 for example, that can be used by other programs.

Wrightak and his team is creating a great list of Japanese exemplary words for the kanji flashcards, I don't know what license they choose to put on this yet, but this will be potentially useful to a great number of programs as well.


Nukemarine Wrote:Yeah, I think RTK2 can work great with Trinity. Mayhaps Mr. Heisig will agree to the use his word choices (and pure groups I guess) for a preset list available to all there?
In Trinity you can pick an exemplary compound for every character. You can pick the exemplary compound that strikes your imagination (and thus memory) and not limit yourself to a preset choice.

You can pick a compound from the drop down list on the kanji page, based on other readings you already know so as not to overload yourself with unknown readings, or you can pick a compound based on your interests and the stuff that you read.

Those were some ideas that motivated me in building Trinity. Right now in the alpha the biggest problem with this is that you can't click kanji from vocablists or sentences (editor) to go to the kanji page, which makes the process of exploring readings and exemplary words a bit cumbersome as you have to go through the "On Yomi" page.
Reply
#27
Ok with some manual work, this is totally doable. Our list could be as good or even better than RTK2. The thing is, I'm not sure just how much manual work is involved. I need to take a break from all this coding so let me just make a note about how this would be done.

Firstly, I've generated a list of onyomi and the characters that have at least one (P) word with this reading AND no (P) words for any other readings. This list is very similar to what Fabrice has for Trinity, though I've noticed minor discrepancies in his list. Using just (P) words limits to about 15,000 words so that's not some small subset; and if we don't limit it then a few obscure words can throw things off.

This list would be broken up amongst the team to work on. Here's an example:

リョウ : 良量両領了瞭療僚陵料涼猟寮

We notice the occurrence of 僚寮療瞭 and now want to find out if there are other characters with this pattern but which have a different reading. We look at the intersection of radicals: 小 日 and select those which form the primitive in question, which in this case is both of them (yeah, 僚 is broken down as just 化 小 日). We then look at the list of all other kanji which have both of those radicals:

小 日 : 影景緒織紳練

None of them fit our criteria, so this group is pure and

僚寮療瞭 : リョウ

goes into our database. Had there been any that fit our criteria, the group would be semi-pure, mixed, or worthless :-p.

Anyway, there's still a lot I'd need to do before I enlist anyone's help. But, I fear I have just burnt out on this project already.
Edited: 2008-07-01, 9:18 pm
Reply
#28
Frabrice, mucho mucho sorry. I didn't realize how well Trinity had been designed for vocabulary practice.

Yeah, I'll be using that for RTK2. Any comments from others that used Trinity for RTK2 portion?
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#29
ファブリス Wrote:By the way, there's something to be said about freely usable data. Jim Breen's KANJIDIC, JMDICT, etc. have spawned countless useful programs. If someone spends time creating a list of "pure groups", which does not limit itself to RtK1 but instead to JLPT1 or KanjiKentei level 1 for example, that can be used by other programs.
You don't need to convince me. The more data, the more fun. But if that data is distilled out of a copyrighted work without any attempt to contact the author or the company, or against those people's wishes, you're not being responsible.

The same of course applies to that primitive list based on RtK1/3. This site has so far abstained of publishing or using such a list, even though it should be quite possible to extract it from the current database. And rightfully so, because the prudent assumption is that that is copyrighted too, and the decision to allow free circulation of such a list should be with the copyright holder. But I'm not his watch dog. I just want to make it clear that asking is easy, easier than fighting over the 'What if's?'. If you don't like the answer, or if that answer never comes, that's a different story.
Reply
#30
radical_tyro Sorry I didn't want to put pressure on you, I just throw ideas. If you really want a list of pure groups for making flashcards and such, you can of course try to get permission from James Heisig directly. Although in that case he'd have to provide you some kind of spreadsheet, otherwise you will have to edit it manually anyway.

Nukemarine If you bought RtK2 you can of course just pick the same compounds from the dropdown list in case you have doubts about what to pick. There may be a few instances where Heisig chose a place name or something rare that would fit the exemplary purpose but won't show up in Trinity.

woelpad I am not advocating to copy any data from RtK of course.

But if you build a script and use data available with permissive licenses such as KANJIDIC, your script is proof of your "source".

There is just something that itches me whereby the kanji are something you would consider "public domain". I find it highly annoying that "selections" of characters become copyrighted material when you can build the same with publicly available data.

I don't contest that someone could spend hours manually sorting through data and then considering that "selection" as their non-reproducible, "original" content. I am just saying at the same time I find it highly annoying to think that it's all based on popular heritage.

These are my opinions, I am in now way advocating to derive any works from James Heisig's excellent RtK series.

I do believe though that it is possible to generate similar lists yourself without breaching any copyrights. Such lists wouldn't be perfect but who needs a "perfect" list with a subject matter like 10000+ kanji ?

PS: "primitives" is something different altogether, this is a clearly identifiatble RtK material. If someone creates a list of characters which reading can be directly derived from a phonetic component without exception (i.e. RtK2's "pure groups"), then you would manually sort the script output based on visual similarity and not based on "primitives", and if you need to decompose them then you decompose them based on the chinese radicals. If you make flashcards of characters linked to readings you don't need to tie in any RtK material though, just kanji > reading.
Reply
#31
Primitives aren't even copyrighted. The only way they would is if you used his exact meanings for each of them. I mean, I could write a book with a list of my interpreted "primitives" and publish it without any problems. Signal primitives are commonly known in kanji, and are not copyrighted either.

"Works consisting entirely of information that is common property and containing no original authorship (for example: standard calendars, height and weight charts, tape measures and rulers, and lists or tables taken from public documents or other common sources)"
http://www.copyright.gov/circs/circ1.html#wnp

"Ideas, Methods, or Systems are not subject to copyright protection. Copyright protection, therefore, is not available for: ideas or procedures for doing, making, or building things; scientific or technical methods or discoveries; business operations or procedures; mathematical principles; formulas, algorithms; or any other concept, process, or method of operation."
http://www.copyright.gov/circs/circ31.html
[This does not apply if a process has been patented.]
Reply
#32
alyks Wrote:Primitives aren't even copyrighted.
Hi alyks, I suppose you mean the chinese radicals, the constituent parts of the chinese characters?

Usually when we say "primitives" it's to refer implicitly to Remembering the Kanji's naming of components, many of which are chinese radicals, some of which are combinations of chinese radicals; i.e. visual components of the characters.

The naming and selection of the "primitives" in itself is James Heisig's work. It could be considered like a code for mnemonics. While the applying of mnemonics to kanji study in itself may not fall under copyright (your pt 2), the naming and selection of primitives I believe does.
Reply
#33
GoodSirJava Wrote:The best way to learn Kanji readings is indirectly, similarly to how English speakers can sound out unfamiliar words (or nonsense words, or onomatopoeic words, or "pseudo-English" pronunciations of foreign words and names) by (unconscious) analogy to other words they already know, without having to consciously resort to some tremendous list of unreliable rules.

In short, RTK2 is aiming at the wrong target. Heisig先生 revolutionized how people learn Kanji, but apparently stuck to old assumptions about how people should learn the Japanese language.
Sorry to yank the topic back. GSJ, I think this is the answer I was looking for. From what I can see, I think the sentences might be the best way for me to go after RTK. Thanks for all your help, everyone.
Reply
#34
ファブリス Wrote:
alyks Wrote:Primitives aren't even copyrighted.
Hi alyks, I suppose you mean the chinese radicals, the constituent parts of the chinese characters?

Usually when we say "primitives" it's to refer implicitly to Remembering the Kanji's naming of components, many of which are chinese radicals, some of which are combinations of chinese radicals; i.e. visual components of the characters.
What I mean is, any person can come up with with a list of visual components that make up all the kanji. It becomes a copyright issue when you use the same meanings Heisig assigns. The selected primitives are not copyrighted either. Basic components like 口 are not copyrighted, radical or not. Combinations that aren't obvious or radicals, like sunglasses in 傑, are not either.

Of course, the exact same list in the exact same order would be a copyright issue.
Edited: 2008-07-02, 4:55 pm
Reply
#35
According to a website called "Intellectual Property Law Server", raw data that represents general knowledge or even specialized knowledge (in this case language rules), would be considered public domain in most cases.

I found an answer to a question about whether math formulas can be considered copyright, and I believe that the answer would also pertain to "language rules" as well:

"A copyright only protects a mode of expresssion. If you duplicated the formulas displayed from a source with a registered copyright, then there would be a risk. If you use the math forumla identical to what you saw, the essence of the formula ought not be protected except as part of a patent. The copyright protects the mode of expression not its meaning."

Basically, it sounds to me like outright duplication of a general knowledge list with the exact same order might be somewhat of a grey area if it represents an order created by a lone individual or corporation, but if you as the one doing the copying are going to change the order, I don't see the problem from a legal standpoint. In fact, if you have a list of knowledge where the order has some logical basis or even a commonly-used arbitrary order, I would say copy as much as you like.... For example, the alphabet list "ABCDEFGHIJLMNOPQRSTUVWXZ" seems pretty arbitrary, but I don't see how you could copyright it since a lot of people adhere to it as a language rule. On the other hand, I MIGHT be able to copyright "BJANOSZLWKZDYTVQMGXC", but even that seems like a stretch. Just because written Japanese has a much more complicated structure doesn't make it some kind of legal exception. "Pure Groups" follow rules of signal primitives which have universal meaning, and which, therefore are themselves public domain. So if you have any concerns, you can just change the order. However, I don't think you would be able to legally use Heisig's RTK 2 vocab compound examples no matter what order since the words that he picked could be considered a form of "expression."

I don't think you would even need to write a script to extract the "pure groups" unless you wanted to. You could just copy from Heisig, and then aftwerwards you can randomize the kanji order so that the public domain uncopyrightable "meaning" remains the same, but the "expression" becomes your own. Obviously, Heisig would be saving you some work here, but I don't think you have any legal obligation to credit him beyond common courtesy.
Edited: 2008-07-02, 7:45 pm
Reply
#36
It's interesting when conversations turn to copyright on various thread. Perhaps there should be a "Copyright" discussion thread somewhere on the general forum.

It's an important discussion for us, as we're discovering for efficient study we need resources that generally come from copyright items in addition to public domain and open sharing of ideas. We understand that this is hard work and want to do right by the appropriate authors.

Also, it's good that a large number of us are taking the high road even when we can go around the rules. Guess the true measure of good character is when you do something right, even when you won't get caught if you did something wrong.
Reply