http://forum.koohii.com/showthread.php?tid=12773&page=2
In that thread cophnia61 compared first 7k words of core 10k with first 7k of the "famous list based on the innocent novel corpus" for the LN Zero no Tsukaima, and apparently the results using cb's Japanese Text Analysis Tool were 28% total coverage with core, and 91% coverage with first 7k of the corpus of innocence.
That can't possibly be right, right? I'd imagine that the 7k is superior because it draws off of innocent reviews of novels, but I can't imagine the difference is that great.
Anyway, does anyone have a link to this innocent corpus? In list form? Like, all put together and whatnot? Or a set of anki cards (or a spreadsheet to make them off of)?
cophnia61 also apparently put up the list he had in pastebin but its unfortunately defunct.
http://forum.koohii.com/showthread.php?p...#pid214948
In that thread cophnia61 compared first 7k words of core 10k with first 7k of the "famous list based on the innocent novel corpus" for the LN Zero no Tsukaima, and apparently the results using cb's Japanese Text Analysis Tool were 28% total coverage with core, and 91% coverage with first 7k of the corpus of innocence.
That can't possibly be right, right? I'd imagine that the 7k is superior because it draws off of innocent reviews of novels, but I can't imagine the difference is that great.
Anyway, does anyone have a link to this innocent corpus? In list form? Like, all put together and whatnot? Or a set of anki cards (or a spreadsheet to make them off of)?
cophnia61 also apparently put up the list he had in pastebin but its unfortunately defunct.
http://forum.koohii.com/showthread.php?p...#pid214948
Edited: 2015-10-02, 6:19 pm
