Joined: May 2014
Posts: 195
Thanks:
15
What is the 5000 character thing all about? Does this mean it only scans the first 5000 characters of whatever I paste in and thus the analysis it gives isn't representative of the whole text?
I pasted in the Ring book and it gave me an easiness rating of 0.8, yet this was one of the easiest novels I've read. I then tried to paste in the first half of OUT and it says: Something went wrong. Please try again later.
Spiral got an easiness rating of 0.0! I agree it's harder than Ring, but that seems off somehow.
Edited: 2015-04-26, 9:34 am
Joined: Dec 2008
Posts: 22
Thanks:
0
Yes, at the moment it is only scanning the first 5000 characters. For the book list, I've limited it to 5000 characters as I don't have the full text for most of the books listed - but it may well make sense to scan the whole thing when you copy and paste, seeing you want the score of everything, not just the first part. I'll have to experiment a bit to see how it goes.
Do you have a link to the Ring book where I could get the text? I'd love to see if I can figure out how to improve the algorithm to match your experience better!
Joined: May 2014
Posts: 195
Thanks:
15
I'm also not exactly sure why the kanji section is displayed the way it is. The 6 columns almost suggest that each belong to the corresponding 6 grades. The color-coding isn't immediately obvious, and there is room for further breakdown if you use kanken levels after grade 6. I think it'd be most useful if the kanji were displayed according to their frequency, because the way it is now is very scatter shot. Also 'and the percent of common kanji you'd need to learn before you'd be expected to know each' <-- I'm not sure I see this information anywhere.
(oh, yeah, link sent!)
Edited: 2015-04-26, 11:51 am
Joined: Dec 2008
Posts: 22
Thanks:
0
All great points Roketzu.
I originally ordered the kanji by "painfulness", but it doesn't make as much sense now that the rating algorithm is focussed more on words. I'll add it to the list of things to fix up. Adding other kanken grades is also a great idea, as is displaying the percentile (I am calculating kanji percentiles, but forgot to add them to the view).
I also had a look at the book you're feeding into the grader - a short read makes me agree with you that it is pretty simple. It turns out a run of fairly simple words are being rated as quite hard, and since the current algorithm increases the score based on runs of "difficult" words instead of the average number of them, this is throwing things out a bit.
I'll spend a bit of time trying to fix it, I'd really like to make the algorithm as useful as possible. It is probably worth noting though that this only makes easy books look difficult, not vice versa - so it shouldn't affect users browsing the list for easy books.
Lastly, anything above "0" is still considered pretty easy! There are some books in my collection which rate -100, I just round everything up to 0, as I'm measuring easiness, not difficulty.