Back

What is the point of 12,800 kanji?

#26
toshiromiballza Wrote:They wanted to include oracle bone script in Unicode too, but they scrapped the idea later. Maybe in the future...
I thought it was sarcasm at first, then I did a bit of research...

http://www.unicode.org/roadmaps/tip/
Reply
#27
I just re-read my post, and I seem to have put in every piece of tangentially relevant information I could think of, but left out the answer to the question: Why should standards like JIS or Unicode have as many characters as possible?

The answer is: because they're nation-wide (in case of JIS) and world-wide standards (Unicode). Their purpose is to allow for universal connectivity. They are meant to allow every computer, OS, programming language, etc. in the world which adheres to this standard, to seamlessly (without need for decyphering and re-encoding it) transmit digital text among each other.

For that, they need to be complete. Otherwise, some people would be forced to use a different standard. Only a complete standard can be used by everyone. Take for instance the current situation, with Japan using a different encoding (JIS, while we use Unicode).

That's because, back in the day, the western computer world made the same argument you're making: we don't need 110,000 characters in our standard, we just need 256 (English letters, Arabic numerals, and a few extra symbols like *^%&*) that can be conveniently represented in one byte (with eight zeros and ones).

But of course, Japan joined the computing revolution pretty fast, and that didn't work for them. So, they came up with their own standard, which worked for them. The West realized its folly eventually, and Unicode was created as a universal standard for all, but by that time it was too late, Japan was using a different standard. So now, every time someone sends me a text file in Japanese, I have to convert it before I can see it, because it's encoded differently. If I write software that's supposed to use such text, again, tons of extra work. Dealing with software already written, where the programmers had no idea about the different encodings: a nightmare. All because some people, decades ago, didn't think a standard for encoding text should be as complete as they can make it, even if including some characters only accommodates 10 people in the whole world.
Reply
#28
Yes and then the West also decided to go ahead with the han codepoint unification despite the objections of pretty much everyone in East Asia, so the saga continues Smile
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#29
Speaking of different standards in computer text and the futility of trying to change an already established standard like Japan's unicode, and to relate it back to the topic:

What is the point of Kanji?

Japan should get rid of not only Kanji but katakana and hiragana as well and replace it with our Western letters. Who cares that Kanji have existed in Japan for centuries, they need to join the modern world and overhaul their entire writing system so that we Westerners can read them. I like Japan but they're too stuck in the past. While the rest of the world are using Windows 8 and sending emails to each other, Japan is still using Windows XP and fax machines. The Japanese writing system is simply the epitome of their collective refusal to modernize.
Edited: 2014-04-30, 1:06 pm
Reply
#30
qwertyytrewq Wrote:Japan should get rid of not only Kanji but katakana and hiragana as well and replace it with our Western letters. Who cares that Kanji have existed in Japan for centuries, they need to join the modern world and overhaul their entire writing system so that we Westerners can read them.
Is this your actual opinion or is this tongue-in-cheek? I find it hard to believe that any student of a language would purposely destroy what they study just so they could use windows 8 and/or so non-students could dabble in it with less effort.

You can't possible mean what you wrote, right? Right?
Reply
#31
qwertyytrewq Wrote:Japan should get rid of not only Kanji but katakana and hiragana as well and replace it with our Western letters.
Agreed, but this is too long standing a debate to rehash here, and it's quite difficult to change a writing system. (EDIT: I'm not sure is qwerty was being sarcastic but I'm not. As I said though, there have been several 15-20+ page debates about this on the forum in the past so there's really no need to do it again.)

Back to the topic: the ~6500 kanji of the JIS 1 and 2 set are enough to write almost everything in modern Japanese, including virtually all the place and personal names. It was an explicit goal of the team who compiled the JIS 2 set to be as comprehensive as possible in name kanji.

The additional kanji beyond those 6500 are almost entirely composed of variants of commonly used characters. Many of the variants differ by only a line or dot. In most cases these can be replaced with standard versions of the characters, and this is commonly done even in serious scholarly editions of pre-modern works.

The non-JIS1/2 kanji that are not simple variants are used in Pre World War II writing on occasion. The only people that really need to make extensive use of kanji outside the JIS-1/2 set are people doing work with Chinese/kanbun texts. Here you encounter a lot of kanji that are not used to represent Japanese (either historical or modern) and thus did not make it onto the JIS1/2 list.
Edited: 2014-04-30, 1:58 pm
Reply
#32
yudantaiteki Wrote:
qwertyytrewq Wrote:Japan should get rid of not only Kanji but katakana and hiragana as well and replace it with our Western letters.
Agreed, but this is too long standing a debate to rehash here, and it's quite difficult to change a writing system.
They already learn English and hence "Western letters" in compulsory school.

This change would only make things a bit (not much) easier for westerners, it wouldn't make it easier for a JPnese to learn English or programming, nor make it faster to type or read.

And China (the kan in kanji) is taking over US as the largest economy by the end of the year (according to The Economist).
Reply
#33
I do suspect qwerty was going a little overboard for effect, but I have to wonder - why would you think that switching Japanese to roman characters would be a boon to anyone in particular? Even with Hepburn, it's really easy to make incorrect guesses about pronunciation based on ingrained habit in English spelling and pronunciation. I'll grant you Kanji are expendable (the Koreans I think proved that, especially the North), but what's wrong with the kana? Is it too much to ask learners to memorize 48 or 96 characters? Hangul's about as bad and you don't really hear too much against that.
Edited: 2014-04-30, 9:42 pm
Reply
#34
Sauzer Wrote:I do suspect qwerty was going a little overboard for effect, but I have to wonder - why would you think that switching Japanese to roman characters would be a boon to anyone in particular? Even with Hepburn, it's really easy to make incorrect guesses about pronunciation based on ingrained habit in English spelling and pronunciation. I'll grant you Kanji are expendable (the Koreans I think proved that, especially the North), but what's wrong with the kana? Is it too much to ask learners to memorize 48 or 96 characters? Hangul's about as bad and you don't really hear too much against that.
Probably the way Korean used to be written, with hangul for their equivalent of kun words and hanja for their equivalent of on words, would be the most sensible thing. This way you avoid having obvious cognates being written with different kanji (to all of you who think the Japanese don't use keywords: lol.)

IMO romaji is not optimal for writting Japanese due to homophony and to the composite nature of most vocabulary, but the current writing system isn't well suited to mass literacy - kanji are fine for monks, noblemen and scriveners, but when it takes 8 years to teach kids how to write and they forget half of it if they stay away from Japan for an extended period of time, you know something's wrong.

Don't get me wrong, I do like kanji a lot. Japan won't change their writing system anyway.

I've also fantasised a lot about English having a more rational writing system too...
Reply
#35
qwertyytrewq Wrote:Speaking of different standards in computer text and the futility of trying to change an already established standard like Japan's unicode and to relate it back to the topic:

What is the point of Kanji?

Japan should get rid of not only Kanji but katakana and hiragana as well and replace it with our Western letters. Who cares that Kanji have existed in Japan for centuries, they need to join the modern world and overhaul their entire writing system so that we Westerners can read them. I like Japan but they're too stuck in the past. While the rest of the world are using Windows 8 and sending emails to each other, Japan is still using Windows XP and fax machines. The Japanese writing system is simply the epitome of their collective refusal to modernize.
There's no such thing as "Japan's unicode". Unicode is a character set that assigns unique numbers to every character used for writing, anywhere. About 110,000 of them, currently. It does this to allow text-based computing (integrated circuits can only perform operations on numbers, they can't handle text, so text must be converted to numbers in a predictable way, before using it with computers, and then converted back to display to humans), and seamless connectivity among such computers (having a single, universal standard for converting text to numbers is a pre-condition of that; otherwise, a computer using one standard to encode text could not send that text to another computer, for display to its user; for instance, if you and I were using different standards for encoding what I'm writing now, you couldn't read this, it would show up as gibberish; for a more concrete example, if you go to the innocent books thread, and download one of the boo...ahem book reviews encoded with JIS, and then try to open it, you most likely won't see Japanese text, you will see gibberish, because your browser/notepad will most likely try to open it as if it's been encoded with Unicode).

Unicode contains every single character the various JIS standards do. If everyone on the planet adopted Unicode (specifically, the UTF-8 encoding of it, because that's the best one), that would solve the issue of encoding text to allow text-based computing, and seamless interaction among programs and networks which perform text-based computing, perfectly. For everyone, in every language.

There are no relevant similarities whatsoever between this issue and the suggestion that Japan change its writing system. Sarcastically asking what the point of Kanji is relates to the topic about to the extent the picture of a kitten would have. And the cat would've at least been adorable.
Edited: 2014-05-01, 5:31 am
Reply
#36
Stansfield123 Wrote:Unicode contains every single character the various JIS standards do. If everyone on the planet adopted Unicode (specifically, the UTF-8 encoding of it, because that's the best one), that would solve the issue of encoding text to allow text-based computing, and seamless interaction among programs and networks which perform text-based computing, perfectly. For everyone, in every language.
Ironically, except for people writing in languages that employ han characters, due to the aforementioned "han unification" space saving measure they did years ago. So if a person is writing a text in primarily Japanese-style Kanji, but wants to (for historical or stylistic purposes) use an older form - there is a good chance that old form shares a codepoint with the contemporary Japanese glyph - they cannot do it without switching that one character to an entirely different traditional Chinese font. Edge case? Sure, but I'm sure it's an annoying one when you encounter it. Incidentally I don't think JIS helps at all there, so they're basically stuck with such workarounds.
Reply
#37
Sauzer Wrote:
Stansfield123 Wrote:Unicode contains every single character the various JIS standards do. If everyone on the planet adopted Unicode (specifically, the UTF-8 encoding of it, because that's the best one), that would solve the issue of encoding text to allow text-based computing, and seamless interaction among programs and networks which perform text-based computing, perfectly. For everyone, in every language.
Ironically, except for people writing in languages that employ han characters, due to the aforementioned "han unification" space saving measure they did years ago. So if a person is writing a text in primarily Japanese-style Kanji, but wants to (for historical or stylistic purposes) use an older form - there is a good chance that old form shares a codepoint with the contemporary Japanese glyph - they cannot do it without switching that one character to an entirely different traditional Chinese font. Edge case? Sure, but I'm sure it's an annoying one when you encounter it. Incidentally I don't think JIS helps at all there, so they're basically stuck with such workarounds.
What do you mean by "except for people writing han characters"? Unicode doesn't provide different codepoints for characters that differ in style, for any language. There's no "olden timey A and modern A in Unicode either. There's just one A that you can add any formatting you'd like to. That falls beyond the scope of Unicode.

Style is kept separate from content on purpose. Unicode addresses content, fonts address style. And, like you said, the Japanese alternatives don't provide that either (because they follow the same principle of separating content and style).

Of course you have to use different fonts, to display a character in different styles. And, unlike with content, there's not much use trying to unify standards for style. It's fine if one computer has one font, and another a different one, they can still communicate text to each other.

It's very important to keep those two layers separate, because content falls into the exclusive realm of programmers (the average online blogger, shop owner, writer, etc doesn't need to know anything about how text is encoded and transmitted over the Internet), while style is something that concerns everyone (even the most shallow user wants some control over how text looks on their blog or in their forum post).

So wouldn't this hypothetical Japanese scholar who wants old characters to be different be happier if he can achieve that by selecting a different font, instead of having to learn about Unicode, hex numbers, code points, and the whole lot?
Edited: 2014-05-01, 7:32 am
Reply
#38
No with Unihan, variant characters are definitely assigned the same codepoint. See http://www8.plala.or.jp/tkubota1/unicode...nihan.html

The article has two example images of character variants that share a codepoint:
http://www8.plala.or.jp/tkubota1/U6D77.png
http://www8.plala.or.jp/tkubota1/U76F4.png
Here's one I found on tofugu:
http://www.tofugu.com/wp-content/uploads...fusion.png

AKA why my RTK deck in AnkiDroid used to look so messed up Tongue

http://homepage.ntlworld.com/jonathan.de...ation.html
the above Wrote:Additionally, the rationalization has given lie to the claim that (to quote chapter 6 of Java Internationalization, ISBN 0596000197) "dealing with unification is simply a matter of choosing a font that contains the glyphs appropriate for that country". As Peterson explains with examples, dealing with the Unicode CJKV rationalization sometimes requires not just specifying a font but specifying a national language as well. Unicode has not in practice eliminated the need for specifying what language one is using in order to specify which characters the character set denotes.
Reply
#39
Sauzer Wrote:http://www8.plala.or.jp/tkubota1/U76F4.png
Oh, OK, I see how that's a problem. But it's a problem that having a separate Japanese standard won't solve anyway. JIS certainly doesn't solve it, since it doesn't have codepoints for the Chinese or Korean versions, it's only concerned with Japanese writing.

You'd need a Han-wide standard to address it. Or just forget about it since it's a minor problem that's USUALLY addressed by using the right font. A Japanese user, using a Japanese font to display Japanese text will not have an issue with this. Even for most Chinese users of Japanese sites, there's a solution: install a Japanese font.

Nevertheless, thanks for clearing things up, you convinced me that this is a distinction that should to be made at the content level, by the Unicode people. But does that really warrant refusing to embrace Unicode?
Quote:As Peterson explains with examples, dealing with the Unicode CJKV rationalization sometimes requires not just specifying a font but specifying a national language as well. Unicode has not in practice eliminated the need for specifying what language one is using in order to specify which characters the character set denotes.
What would be a real life example where specifying a font doesn't solve the problem?

I can think of one hypothetical one: someone wishes to transmit a single block of indivisible text with both Chinese and Japanese in it. In that case, specifying a font for the whole text wouldn't solve the problem, and specifying different fonts wouldn't be possible. But, in practice, are there even fonts that can handle both Japanese and Chinese, to begin with?
Edited: 2014-05-01, 7:56 am
Reply
#40
Unicode is working on a Ideographic Variation Database to confront this issue (not really working on it, it's already implemented).

And it doesn't make sense to encode each glyph variation with a unique index. It's like encoding Fraktur into Unicode instead of letting a font take care of it.

Stansfield123 Wrote:But, in practice, are there even fonts that can handle both Japanese and Chinese, to begin with?
Probably only HanaMin. Others are country-specific, but include all native variations nonetheless (IPAex for example).
Edited: 2014-05-01, 8:22 am
Reply
#41
qwertyytrewq Wrote:Japan should get rid of not only Kanji but katakana and hiragana
In case people didn't pick up on it, in my previous post I was being hyperbolic and satirizing and making fun of the (former) Japanese language learners who gave up on learning Japanese because they suck at it so they rationalize their failure by saying that Kanji should ultimately, in a just world, conform to their Western mentalities instead of them accepting that the Japanese language is what it is and that they should stop sucking. To further clarify, I do not believe anything I wrote in that post, because I do not suck.
Edited: 2014-05-01, 12:28 pm
Reply
#42
I'm not sure why people here act like obscure kanji aren't used. There are tons of popular modern works that use loads of them.

poblequadrat Wrote:
Sauzer Wrote:I do suspect qwerty was going a little overboard for effect, but I have to wonder - why would you think that switching Japanese to roman characters would be a boon to anyone in particular? Even with Hepburn, it's really easy to make incorrect guesses about pronunciation based on ingrained habit in English spelling and pronunciation. I'll grant you Kanji are expendable (the Koreans I think proved that, especially the North), but what's wrong with the kana? Is it too much to ask learners to memorize 48 or 96 characters? Hangul's about as bad and you don't really hear too much against that.
Probably the way Korean used to be written, with hangul for their equivalent of kun words and hanja for their equivalent of on words, would be the most sensible thing. This way you avoid having obvious cognates being written with different kanji (to all of you who think the Japanese don't use keywords: lol.)

IMO romaji is not optimal for writting Japanese due to homophony and to the composite nature of most vocabulary, but the current writing system isn't well suited to mass literacy - kanji are fine for monks, noblemen and scriveners, but when it takes 8 years to teach kids how to write and they forget half of it if they stay away from Japan for an extended period of time, you know something's wrong.

Don't get me wrong, I do like kanji a lot. Japan won't change their writing system anyway.

I've also fantasised a lot about English having a more rational writing system too...
Kanji are a lot more concise, and portray much more history and profound meaning in the text than dry entirely phonetic Latin based script.

An example is Canada is called "Canada", but nobody knows why it's called Canada or what Canada means as the natives aren't around anymore and all we have is the name transliterated into English. Kanji is a strength of the Japanese language. Not a weakness. And with better and more streamlined learning methods it's not very difficult at all to learn.

It's like reading with an extra dimension of comprehension.
Edited: 2014-05-01, 4:01 pm
Reply
#43
ryuudou Wrote:I'm not sure why people here act like obscure kanji aren't used. There are tons of popular modern works that use loads of them.
Then they're not obscure, are they? Tongue
Reply
#44
Stansfield123 Wrote:But does that really warrant refusing to embrace Unicode?
No my god definitely not - I'm just saying the implementation was not as perfect as it might seem. Unicode is still a fantastic and necessary project.

toshiromiballza Wrote:It's like encoding Fraktur into Unicode instead of letting a font take care of it.
This is a good point and to us (outside the Han sphere) I'm sure it looks like we're splitting hairs on the distinction between serif and sans serif fonts.

I guess the idea is that a it's bit more like if Fraktur was the only typeface say Germany ever used - was the only one taught, down to the stroke orders being canonical. Then say the English only ever used Spencerian. To one another they may be quite hard to grok at first. Still, even if they are not, nationalism (which the Han sphere has no shortage of) has a way of making these things much more dicey.

ryuudou Wrote:Kanji are a lot more concise, and portray much more history and profound meaning in the text
Concise in the strict data codepoints of the text maybe, but the difference in effort to write Economics and 経済 is negligible at best. I am sympathetic that they do have historical and cultural significance to the userbase but the 'profound meaning' bit is Orientalist cruff, even when they do it!
Reply
#45
yudantaiteki Wrote:
ryuudou Wrote:I'm not sure why people here act like obscure kanji aren't used. There are tons of popular modern works that use loads of them.
Then they're not obscure, are they? Tongue
They are for normal people, and in every day life and reading, but that doesn't mean they aren't used in a lot of modern works which was my point.
Reply
#46
Non-JIS1/2 kanji? I will admit I don't read a lot of popular works but it would surprise me to learn that they frequently use lots of kanji that aren't in the JIS1/2 set. What are some examples?

Also I said I wasn't going to engage the kanji/romaji debate again, but I just wanted to address one point:
Quote:by saying that Kanji should ultimately, in a just world, conform to their Western mentalities instead of them accepting that the Japanese language is what it is and that they should stop sucking.
The first criticisms of kanji and implications that kana or roman letters would be better date at least to the mid-18th century and come from native Japanese writers. For instance, the famous 国学者 Kamo no Mabuchi wrote something in 1765 criticizing the difficulty of learning and remembering Chinese characters, praising India and Holland for their writing systems, and expressing regret that Japan did not use only kana. Of course some of this criticism was coming from the anti-Chinese bias of the kokugakusha but the idea is much older than the time when many foreigners were learning Japanese.
Edited: 2014-05-02, 2:04 pm
Reply
#47
I do research in pre-Qin dynasty palaeography. I'm in grad school in Taiwan right now, so my papers are all written in Chinese. I have fonts that cover over 77,000 characters (including Unicode CJK Extensions A and B), but that's still not enough and I end up having to create images of characters to embed in my papers. It's a huge pain. I think if there were only 12,800 covered in Unicode, I'd probably give up.

There are most certainly more than 10 people in my field, by the way, including a lot of Japanese scholars.

I find the notion that "No human being could possibly remember" 12,800 characters really laughable. Come on, man. I know probably 4000 plus a lot of variants, and I've only been studying for about 3 years. I can guarantee many of my professors know more than 12,800. Presumably the same goes for professors in Chinese departments in Japan, too. You spend your life reading Chinese from all periods, doing research on pre-standardized Chinese texts, and that just happens.
Reply
#48
qwertyytrewq Wrote:
qwertyytrewq Wrote:Japan should get rid of not only Kanji but katakana and hiragana
In case people didn't pick up on it, in my previous post I was being hyperbolic and satirizing and making fun of ...
@qwertyytrewq: I thought you were pretty funny actually Big Grin ! And I just upgraded my old "roller ball" blackberry to an iPhone 4 year and just disconnected my fax machine after several years of not getting even one single fax, and even now I am still keeping them in the garage for just in case I need it again someday!
Reply
#49
I think I've invested to much time into learning kanji to be arguing for their abolition, and I guess it's the same for the Japanese and most of us here. I think that's the reason they still have them, because I actually think they don't add any (enough?) benefit to be worth so much study.
Reply