kanji koohii FORUM
My study method (Study pack included) - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: My study method (Study pack included) (/thread-5011.html)

Pages: 1 2 3 4 5 6


My study method (Study pack included) - wccrawford - 2010-02-14

nest0r Wrote:a similar experiment.
Along the same lines as that experiment, apparently if you try to install it with Wine in Linux, it runs perfectly fine. In theory, I mean.


My study method (Study pack included) - captal - 2010-02-14

Thanks heaps raseru- this should be really helpful as I've been trying to up my reading quantity. I bought a bunch of books before I left Japan a couple months ago but I haven't really started tackling them. I think I'll try going through some of the books you uploaded first and see how that goes.

edit: Is there a way to make the format prettier in Firefox? Some of them work fine, but I opened the first book of Harry Potter in Firefox and it's all readable but it's mashed together without any spacing/page breaks. If I open it in Word all the breaks are there and it's much more readable, but I can't use the quick look-up of Rikaichan.

edit2: Ok figured it out- it works if I opened the .txt file instead of the .html file in Firefox- then all of the formatting is preserved and Rikaichan still works.


My study method (Study pack included) - raseru - 2010-02-14

Weird, both work for me. Good thing the txt was still there then I guess
The code <pre style="white-space:pre-wrap"> is what allows you to view it in a browser without it being wacky. Before I found out how to do that code I had to just suck it up, and even used a add-on that does word-wrap and used <pre> lol


My study method (Study pack included) - nest0r - 2010-02-14

Ohh look, I'll need to rerun my 'simulation', as the 'Corporate' edition of the software theoretically allows for unlimited processing, rather than 50 pages max.


My study method (Study pack included) - Thora - 2010-02-14

Just wanted to include here something from another thread. The topic was “essential” kanji and vocab and what might be required for various types of reading. iSoron kindly analyzed some of raseru’s texts and provided a detailed breakdown of the kanji and the vocab in each work.
iSoron Wrote:Maybe you'll find this useful? http://isoron.org/stuff/japanese/analysis/

For the curious, I'm using Mecab plus a mess of bash, python and ruby scripts.
An article (“What Vocabulary Size is Needed to Read Unsimplified Texts for Pleasure?) recommends pre-learning words which occur frequently in a text as a good way to improve reading fluency. (As opposed to intensive reading to learn every word or pre-learning hundreds of words which occur only once or twice.) Seems like common sense, but to actually create such vocab and frequency lists is beyond the ability of most readers. Thanks to iSoron, you have it!


My study method (Study pack included) - raseru - 2010-02-14

Nice

You guys wouldn't happen to know a way to get rid of the rubi/furigana? It screws up rikai-chan.
If only there was a way to replace (like ctrl+h) the inside as well, like "《~》"


My study method (Study pack included) - raseru - 2010-02-14

Haha, just thought of a way and it works, but it's pretty ghetto. (if there's a better way, feel free to say it)

open the .html file in txt
press ctrl + h
Find : 《
replace : <!--
Find : 》
replace : -->

Would be even cooler if you could just make rikai-chan skip it though because occasionally readings are supposed to be something different

Edit: Doesn't work Sad Rikai-chan won't skip past the invisible html
Edit: You can copy all of the text in the HTML and replace it in the txt then rikai-chan will work. Kind of a pain though


My study method (Study pack included) - wccrawford - 2010-02-14

raseru Wrote:Nice

You guys wouldn't happen to know a way to get rid of the rubi/furigana? It screws up rikai-chan.
If only there was a way to replace (like ctrl+h) the inside as well, like "《~》"
A good editor will let you use RegEx to replace. Even the most basic ones in Linux do it. I dunno what to recommend for Windows, though. http://www.jujusoft.com/software/edit/ maybe. I haven't tried it.


My study method (Study pack included) - nest0r - 2010-02-14

Now someone needs to take 100 .txt files of light novels, run an analysis on the vocabulary, and sort the 95-98% most common vocabulary items and their kanji. Isn't that a good idea? I bet between us all, we could get a corpus of far more than 100.


My study method (Study pack included) - wccrawford - 2010-02-14

nest0r Wrote:Now someone needs to take 100 .txt files of light novels, run an analysis on the vocabulary, and sort the 95-98% most common vocabulary items and their kanji. Isn't that a good idea? I bet between us all, we could get a corpus of far more than 100.
Actually, I'd rather pick the 'easiest' ones and do that with them... Getting into novels is the hard part. Once you're in, the rest is cake.

I have to admit that analysis would be a lot better than the newspaper frequency list. Who really reads newspapers any more?


My study method (Study pack included) - raseru - 2010-02-14

《 doesn't seem to work in ctrl + h in that software

nest0r Wrote:Now someone needs to take 100 .txt files of light novels, run an analysis on the vocabulary, and sort the 95-98% most common vocabulary items and their kanji. Isn't that a good idea? I bet between us all, we could get a corpus of far more than 100.
One thing I noticed is authors often use the same vocabulary, so if you aim for maybe different authors as well, it might be more accurate


My study method (Study pack included) - nest0r - 2010-02-14

raseru Wrote:《 doesn't seem to work in ctrl + h in that software

nest0r Wrote:Now someone needs to take 100 .txt files of light novels, run an analysis on the vocabulary, and sort the 95-98% most common vocabulary items and their kanji. Isn't that a good idea? I bet between us all, we could get a corpus of far more than 100.
One thing I noticed is authors often use the same vocabulary, so if you aim for maybe different authors as well, it might be more accurate
Hmm, maybe only 2-3 volumes per author? Enough to get a good sampling of their style across works.


My study method (Study pack included) - nest0r - 2010-02-14

wccrawford Wrote:
nest0r Wrote:Now someone needs to take 100 .txt files of light novels, run an analysis on the vocabulary, and sort the 95-98% most common vocabulary items and their kanji. Isn't that a good idea? I bet between us all, we could get a corpus of far more than 100.
Actually, I'd rather pick the 'easiest' ones and do that with them... Getting into novels is the hard part. Once you're in, the rest is cake.

I have to admit that analysis would be a lot better than the newspaper frequency list. Who really reads newspapers any more?
I'm not sure what you mean. Aren't all light novels in the same basic range? I'm a n00b.


My study method (Study pack included) - raseru - 2010-02-14

yeah that'd probably be good. Just saying if like 15 of the volumes are from one guy it might be a little off, haha


My study method (Study pack included) - raseru - 2010-02-14

nest0r Wrote:
wccrawford Wrote:
nest0r Wrote:Now someone needs to take 100 .txt files of light novels, run an analysis on the vocabulary, and sort the 95-98% most common vocabulary items and their kanji. Isn't that a good idea? I bet between us all, we could get a corpus of far more than 100.
Actually, I'd rather pick the 'easiest' ones and do that with them... Getting into novels is the hard part. Once you're in, the rest is cake.

I have to admit that analysis would be a lot better than the newspaper frequency list. Who really reads newspapers any more?
I'm not sure what you mean. Aren't all light novels in the same basic range? I'm a n00b.
Newspapers have really really stupid words that shouldn't exist like 日露. I saw one kanji in a newspaper before and asked 2 Japanese highschoolers to read it and they couldn't (鬨). Even with context and me telling them how to read it, they still didn't know the word


My study method (Study pack included) - wccrawford - 2010-02-14

nest0r Wrote:
wccrawford Wrote:
nest0r Wrote:Now someone needs to take 100 .txt files of light novels, run an analysis on the vocabulary, and sort the 95-98% most common vocabulary items and their kanji. Isn't that a good idea? I bet between us all, we could get a corpus of far more than 100.
Actually, I'd rather pick the 'easiest' ones and do that with them... Getting into novels is the hard part. Once you're in, the rest is cake.

I have to admit that analysis would be a lot better than the newspaper frequency list. Who really reads newspapers any more?
I'm not sure what you mean. Aren't all light novels in the same basic range? I'm a n00b.
That analysis posted above shows that 1 uses 1500 kanji, 1 uses 2000, and the rest use 2500. That's fairly different, with the lowest using only 3/5 the kanji of the highest.

But just like comics, there's low-end (Yotsuba&), middle (Death Note), and high-end (Read or Die).

Someone posted a nice link to a site that had a lot of easy light novels, and even had previews of them... But I can't find it now and I lost my bookmark in a freak accident.


My study method (Study pack included) - nest0r - 2010-02-14

raseru Wrote:
nest0r Wrote:
wccrawford Wrote:Actually, I'd rather pick the 'easiest' ones and do that with them... Getting into novels is the hard part. Once you're in, the rest is cake.

I have to admit that analysis would be a lot better than the newspaper frequency list. Who really reads newspapers any more?
I'm not sure what you mean. Aren't all light novels in the same basic range? I'm a n00b.
Newspapers have really really stupid words that shouldn't exist like 日露. I saw one kanji in a newspaper before and asked 2 Japanese highschoolers to read it and they couldn't (鬨)
Oh, I was referring to the 'easiest' part of his comment. Well, I think I might already have like 50 .txt files to contribute<---That's what I would say if I were a pirate, which I'm not. So I definitely won't be posting any Google search suggestions to avoid later.


My study method (Study pack included) - wccrawford - 2010-02-14

http://shop.kodansha.jp/bc2_bc/search_view.jsp?b=1487310 - Found it. That's a pretty easy light novel. The link was in the Harry Potter thread, btw.


My study method (Study pack included) - nest0r - 2010-02-14

wccrawford Wrote:That analysis posted above shows that 1 uses 1500 kanji, 1 uses 2000, and the rest use 2500. That's fairly different, with the lowest using only 3/5 the kanji of the highest.

But just like comics, there's low-end (Yotsuba&), middle (Death Note), and high-end (Read or Die).

Someone posted a nice link to a site that had a lot of easy light novels, and even had previews of them... But I can't find it now and I lost my bookmark in a freak accident.
I see. Well as long as we preserve the breakdown of each work before doing a 'meta' analysis, it shouldn't be too much of a problem.


My study method (Study pack included) - raseru - 2010-02-14

I wonder if that OCR program is even all that accurate though. Like even if it's correct 95% of the time, 5% of the time you'll just be super confused. I think some of the txt files were possibly marketed in that format and not OCR'd to begin with

I shall... experiment with it myself, when it "arrives" on my computer


My study method (Study pack included) - nest0r - 2010-02-14

raseru Wrote:I wonder if that OCR program is even all that accurate though. Like even if it's correct 95% of the time, 5% of the time you'll just be super confused. I think some of the txt files were possibly marketed in that format and not OCR'd to begin with

I shall... experiment with it myself, when it "arrives" on my computer
Seems pretty accurate to me, but for my purposes it's more trouble than it's worth to correct the errors and tweak the formatting, etc. Actually I don't know, trying more samples, seems like it makes some really dumb errors. Like converts furigana to random words or something.


My study method (Study pack included) - nest0r - 2010-02-15

Whatever you do, don't do a Google search for 'raw light novels txt', you might turn up an alphanumeric MU code that links to a messy batch of virus-free files. Although, I bet some immoral person could use them for corpus analysis, if they knew how to do that sort of thing.


My study method (Study pack included) - raseru - 2010-02-15

Holy crap that is a lot of stories, where ever did you find all of these? -- is what I would say if I was some kind of a pirate


My study method (Study pack included) - nest0r - 2010-02-15

raseru Wrote:Holy crap that is a lot of stories, where ever did you find all of these? -- is what I would say if I was some kind of a pirate
Which ones? Those, or the ones that come up if your hand slips and you type 'raw light novels txt batch 2'?


My study method (Study pack included) - raseru - 2010-02-15

... I.. Don't think I'll ever run out of txt books to read. Ever. o.o