Joined: Oct 2007
Posts: 4,582
Thanks:
0
The Google explanation makes me think that's not what it is.
Joined: Oct 2007
Posts: 4,582
Thanks:
0
That is interesting; some factors to consider, but it seems surprisingly cognitive in orientation, towards processing load or somesuch, since it doesn't talk about other factors regarding the language. I wonder if the other 4 factors cover that?
Joined: Jan 2008
Posts: 1,458
Thanks:
20
I notice that by far the heaviest weighted factor is LC, which is probably a reasonable proxy for difficulty of vocab: lots of two-kanji (or more) compounds and your LC score goes up; more 和語 and your LC goes down and your LH goes up...
Interesting that LS is has such a light weighting there; I'd have guessed it would be worth more.
Joined: Oct 2007
Posts: 4,582
Thanks:
0
Hmm, hadn't thought of it that way. Reminds of the stick thingy in the Aozora texts, actually.
I wonder if the amount of カタカナ could indicate the amount of technical language and weigh the results towards advanced?
Joined: Oct 2007
Posts: 4,582
Thanks:
0
Cool, yeah that's the linked program in the more info section that I was referring to—wasn't sure if you'd be able to get it working because of the MacOS thing, but when they said Ruby and Unix I figured you could.
Reading that paper more closely, it seems like their usage is both more robust and more effective than the modified version of Taiteisi's formula that Hayashi used in 1992 (which is the 6 factor version posted above [ahha! Figured out why I kept typing ‘below’ when referencing posters' comments, it's because on the edit page the direction is reversed]). It requires only 25 characters or so to be accurate and apparently can be used on anything, even stuff with incomplete sentences.
Does what they provide give the correlation coefficient, though? It looks like it just outputs the grade level (and it's been modified in Obi2 for 14 grades), but skimming through that offline code it seems like it can be modified to output other stuff.
Edit: Just read your Edit 2. Excellent! *steeples fingers*
Edit: And of course 25 characters would be the bare minimum. I think that chart showed 100-250 or something to be the most accurate baseline.
Edited: 2011-02-19, 3:07 pm
Joined: Aug 2008
Posts: 163
Thanks:
0
That online one that Nestor quoted seems OK, basically what they did was scan in a whole bunch of school textbooks by grade, for grade 13 they used university textbooks. They used that as a basis for how difficult a piece of text is. With some statistical magic, for any given text they can tell you which grade level it corresponds to. In the paper they said if there's a large amount of katakana it gives an easier score. They don't know exactly why, but their theory is that Japanese school textbooks don't have that much katakana in them.
It actually doesn't matter what the score is really, as long as it is consistent. So if you put some text in and it gives you a score of X, then any more difficult text should be less than X, and easier more than X (with 0 being hard, 100 being easy).
Just out of curiosity I threw some stuff in:
Genki (I), 1st dialog: 1
Genki (I), last dialog: 5
HP 1, chap 8: 6
HP 7: chap 8: 6
Mainichi random articles: between 9 and 12
Linguaphone Dialog 1: 8
Linguaphone Dialog 2: 5
Linguaphone Dialog 50: 8
The Linguaphone scores are interesting, since dialog 1 is relatively simple. I think it's to do with the fact that the sentences are short, but contain a relative large number of kanji.
Joined: Oct 2007
Posts: 4,582
Thanks:
0
Do they eliminate katakana consideration from Obi2? Because they noted that the readability estimate becomes stable once they exclude them from the operative characters?
At any rate, I believe that novels won't all be consistent in their readability, i.e. one passage from 1Q84 might be shown as Grade 9 while the other is ‘14’, but we're looking at overall scores... So if you're looking for fine-grained, reader-specific stuff, might want to keep it to 200-2000 character passages.
Is 1Q84 (a friend of a friend got Grade 10 overall) not considered a book appropriately easy for Japanese high school students?
Joined: Oct 2007
Posts: 4,582
Thanks:
0
Wow, thank you for doing this.
So I wonder how accurate it is. ^_^ Judging by the titles, seems like it might line up well with grade level standards? For instance, a book of rakugo is 3, while a book on Foucault is 13.
@travis
My impression is that even with as little as 25 characters and incomplete sentences, it's more accurate than the Hayashi/Taiteisi formula? I don't know, I guess I'll keep a mental note there for stuff heavy on katakana but otherwise assume Obi2 is always superior. Not that I'll be using this in any strict way.
Edited: 2011-02-19, 7:31 pm