![]() |
|
My study method (Study pack included) - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: My study method (Study pack included) (/thread-5011.html) |
My study method (Study pack included) - wccrawford - 2010-02-14 nest0r Wrote:a similar experiment.Along the same lines as that experiment, apparently if you try to install it with Wine in Linux, it runs perfectly fine. In theory, I mean. My study method (Study pack included) - captal - 2010-02-14 Thanks heaps raseru- this should be really helpful as I've been trying to up my reading quantity. I bought a bunch of books before I left Japan a couple months ago but I haven't really started tackling them. I think I'll try going through some of the books you uploaded first and see how that goes. edit: Is there a way to make the format prettier in Firefox? Some of them work fine, but I opened the first book of Harry Potter in Firefox and it's all readable but it's mashed together without any spacing/page breaks. If I open it in Word all the breaks are there and it's much more readable, but I can't use the quick look-up of Rikaichan. edit2: Ok figured it out- it works if I opened the .txt file instead of the .html file in Firefox- then all of the formatting is preserved and Rikaichan still works. My study method (Study pack included) - raseru - 2010-02-14 Weird, both work for me. Good thing the txt was still there then I guess The code <pre style="white-space:pre-wrap"> is what allows you to view it in a browser without it being wacky. Before I found out how to do that code I had to just suck it up, and even used a add-on that does word-wrap and used <pre> lol My study method (Study pack included) - nest0r - 2010-02-14 Ohh look, I'll need to rerun my 'simulation', as the 'Corporate' edition of the software theoretically allows for unlimited processing, rather than 50 pages max. My study method (Study pack included) - Thora - 2010-02-14 Just wanted to include here something from another thread. The topic was “essential” kanji and vocab and what might be required for various types of reading. iSoron kindly analyzed some of raseru’s texts and provided a detailed breakdown of the kanji and the vocab in each work. iSoron Wrote:Maybe you'll find this useful? http://isoron.org/stuff/japanese/analysis/An article (“What Vocabulary Size is Needed to Read Unsimplified Texts for Pleasure?) recommends pre-learning words which occur frequently in a text as a good way to improve reading fluency. (As opposed to intensive reading to learn every word or pre-learning hundreds of words which occur only once or twice.) Seems like common sense, but to actually create such vocab and frequency lists is beyond the ability of most readers. Thanks to iSoron, you have it! My study method (Study pack included) - raseru - 2010-02-14 Nice You guys wouldn't happen to know a way to get rid of the rubi/furigana? It screws up rikai-chan. If only there was a way to replace (like ctrl+h) the inside as well, like "《~》" My study method (Study pack included) - raseru - 2010-02-14 Haha, just thought of a way and it works, but it's pretty ghetto. (if there's a better way, feel free to say it) open the .html file in txt press ctrl + h Find : 《 replace : <!-- Find : 》 replace : --> Would be even cooler if you could just make rikai-chan skip it though because occasionally readings are supposed to be something different Edit: Doesn't work Rikai-chan won't skip past the invisible htmlEdit: You can copy all of the text in the HTML and replace it in the txt then rikai-chan will work. Kind of a pain though My study method (Study pack included) - wccrawford - 2010-02-14 raseru Wrote:NiceA good editor will let you use RegEx to replace. Even the most basic ones in Linux do it. I dunno what to recommend for Windows, though. http://www.jujusoft.com/software/edit/ maybe. I haven't tried it. My study method (Study pack included) - nest0r - 2010-02-14 Now someone needs to take 100 .txt files of light novels, run an analysis on the vocabulary, and sort the 95-98% most common vocabulary items and their kanji. Isn't that a good idea? I bet between us all, we could get a corpus of far more than 100. My study method (Study pack included) - wccrawford - 2010-02-14 nest0r Wrote:Now someone needs to take 100 .txt files of light novels, run an analysis on the vocabulary, and sort the 95-98% most common vocabulary items and their kanji. Isn't that a good idea? I bet between us all, we could get a corpus of far more than 100.Actually, I'd rather pick the 'easiest' ones and do that with them... Getting into novels is the hard part. Once you're in, the rest is cake. I have to admit that analysis would be a lot better than the newspaper frequency list. Who really reads newspapers any more? My study method (Study pack included) - raseru - 2010-02-14 《 doesn't seem to work in ctrl + h in that software nest0r Wrote:Now someone needs to take 100 .txt files of light novels, run an analysis on the vocabulary, and sort the 95-98% most common vocabulary items and their kanji. Isn't that a good idea? I bet between us all, we could get a corpus of far more than 100.One thing I noticed is authors often use the same vocabulary, so if you aim for maybe different authors as well, it might be more accurate My study method (Study pack included) - nest0r - 2010-02-14 raseru Wrote:《 doesn't seem to work in ctrl + h in that softwareHmm, maybe only 2-3 volumes per author? Enough to get a good sampling of their style across works. My study method (Study pack included) - nest0r - 2010-02-14 wccrawford Wrote:I'm not sure what you mean. Aren't all light novels in the same basic range? I'm a n00b.nest0r Wrote:Now someone needs to take 100 .txt files of light novels, run an analysis on the vocabulary, and sort the 95-98% most common vocabulary items and their kanji. Isn't that a good idea? I bet between us all, we could get a corpus of far more than 100.Actually, I'd rather pick the 'easiest' ones and do that with them... Getting into novels is the hard part. Once you're in, the rest is cake. My study method (Study pack included) - raseru - 2010-02-14 yeah that'd probably be good. Just saying if like 15 of the volumes are from one guy it might be a little off, haha My study method (Study pack included) - raseru - 2010-02-14 nest0r Wrote:Newspapers have really really stupid words that shouldn't exist like 日露. I saw one kanji in a newspaper before and asked 2 Japanese highschoolers to read it and they couldn't (鬨). Even with context and me telling them how to read it, they still didn't know the wordwccrawford Wrote:I'm not sure what you mean. Aren't all light novels in the same basic range? I'm a n00b.nest0r Wrote:Now someone needs to take 100 .txt files of light novels, run an analysis on the vocabulary, and sort the 95-98% most common vocabulary items and their kanji. Isn't that a good idea? I bet between us all, we could get a corpus of far more than 100.Actually, I'd rather pick the 'easiest' ones and do that with them... Getting into novels is the hard part. Once you're in, the rest is cake. My study method (Study pack included) - wccrawford - 2010-02-14 nest0r Wrote:That analysis posted above shows that 1 uses 1500 kanji, 1 uses 2000, and the rest use 2500. That's fairly different, with the lowest using only 3/5 the kanji of the highest.wccrawford Wrote:I'm not sure what you mean. Aren't all light novels in the same basic range? I'm a n00b.nest0r Wrote:Now someone needs to take 100 .txt files of light novels, run an analysis on the vocabulary, and sort the 95-98% most common vocabulary items and their kanji. Isn't that a good idea? I bet between us all, we could get a corpus of far more than 100.Actually, I'd rather pick the 'easiest' ones and do that with them... Getting into novels is the hard part. Once you're in, the rest is cake. But just like comics, there's low-end (Yotsuba&), middle (Death Note), and high-end (Read or Die). Someone posted a nice link to a site that had a lot of easy light novels, and even had previews of them... But I can't find it now and I lost my bookmark in a freak accident. My study method (Study pack included) - nest0r - 2010-02-14 raseru Wrote:Oh, I was referring to the 'easiest' part of his comment. Well, I think I might already have like 50 .txt files to contribute<---That's what I would say if I were a pirate, which I'm not. So I definitely won't be posting any Google search suggestions to avoid later.nest0r Wrote:Newspapers have really really stupid words that shouldn't exist like 日露. I saw one kanji in a newspaper before and asked 2 Japanese highschoolers to read it and they couldn't (鬨)wccrawford Wrote:Actually, I'd rather pick the 'easiest' ones and do that with them... Getting into novels is the hard part. Once you're in, the rest is cake.I'm not sure what you mean. Aren't all light novels in the same basic range? I'm a n00b. My study method (Study pack included) - wccrawford - 2010-02-14 http://shop.kodansha.jp/bc2_bc/search_view.jsp?b=1487310 - Found it. That's a pretty easy light novel. The link was in the Harry Potter thread, btw. My study method (Study pack included) - nest0r - 2010-02-14 wccrawford Wrote:That analysis posted above shows that 1 uses 1500 kanji, 1 uses 2000, and the rest use 2500. That's fairly different, with the lowest using only 3/5 the kanji of the highest.I see. Well as long as we preserve the breakdown of each work before doing a 'meta' analysis, it shouldn't be too much of a problem. My study method (Study pack included) - raseru - 2010-02-14 I wonder if that OCR program is even all that accurate though. Like even if it's correct 95% of the time, 5% of the time you'll just be super confused. I think some of the txt files were possibly marketed in that format and not OCR'd to begin with I shall... experiment with it myself, when it "arrives" on my computer My study method (Study pack included) - nest0r - 2010-02-14 raseru Wrote:I wonder if that OCR program is even all that accurate though. Like even if it's correct 95% of the time, 5% of the time you'll just be super confused. I think some of the txt files were possibly marketed in that format and not OCR'd to begin withSeems pretty accurate to me, but for my purposes it's more trouble than it's worth to correct the errors and tweak the formatting, etc. Actually I don't know, trying more samples, seems like it makes some really dumb errors. Like converts furigana to random words or something. My study method (Study pack included) - nest0r - 2010-02-15 Whatever you do, don't do a Google search for 'raw light novels txt', you might turn up an alphanumeric MU code that links to a messy batch of virus-free files. Although, I bet some immoral person could use them for corpus analysis, if they knew how to do that sort of thing. My study method (Study pack included) - raseru - 2010-02-15 Holy crap that is a lot of stories, where ever did you find all of these? -- is what I would say if I was some kind of a pirate My study method (Study pack included) - nest0r - 2010-02-15 raseru Wrote:Holy crap that is a lot of stories, where ever did you find all of these? -- is what I would say if I was some kind of a pirateWhich ones? Those, or the ones that come up if your hand slips and you type 'raw light novels txt batch 2'? My study method (Study pack included) - raseru - 2010-02-15 ... I.. Don't think I'll ever run out of txt books to read. Ever. o.o |