blackmacros Wrote:What sort of format is this 'matrix' thing in? I'm sure there is a spreadsheet somewhere on this forum for something like the Kanji Kentei (is that it? the exam that tests kanji knowledge, for native speakers), which would have like 6000+ kanji in it right? Is something like that exportable into this matrix you're talking about?completely unrelated
EDIT: I was thinking of this spreadsheet. http://spreadsheets.google.com/ccc?key=p...yZHw&hl=en
~6500 kanji (as well as keywords) in a spreadsheet.
a matrix is the equivalence table that a OCR software use to recognize an image and transform it in a symbol . As you probably already know subtitles on DVD are not stored as text file (srt,ass,ssa etc....) but as image file (sub vob)
the function of the OCR software is to transform those images into text through the identification of letter . Now imagine a prodigy child that is able to learn the calligraphy of thousands of kanji in the blink of an eye ....no matter how good he is you still have to teach him what each kanji means , the order of the strokes ,etc.... well the OCR software is infinitely more efficient than you to copy the subtitles but he still needs to be teached .... for every and each symbol he doesn t recognize you must type in his text-based counterpart . And he s especially dumb :he s unable to recognize the same character displayed in italic , bold , underlined style.... you ve got to teach him everything .
It s not much of a problem with western alphabet .... but with kanji.....see above
Edited: 2009-08-19, 8:56 am
