Back

多読 Tadoku Reading Challenge

(2017-06-12, 10:05 am)sholum Wrote: For something a little different, I got 狼と香辛料, which I tried to read way back and couldn't; might help bridge the gap between my forte (YA fiction, apparently) and everything else, with all the economics.
I quite like this series (I'm up to about volume 14 or so now), but it is a bit trickier than the average light novel I think. In particular the banter between the main characters can be tricky to understand because one of the core dynamics is "Horo sets a verbal trap, or expects Laurence to figure out why she's annoyed from context", so there's a lot of dialogue where people don't actually come out and say what they mean. Horo also has her own dialect with weird verb conjugations but that's not too hard to get used to. There's also the usual medieval fantasy setting specialist vocab.
Reply
(2017-06-12, 3:07 pm)pm215 Wrote:
(2017-06-12, 10:05 am)sholum Wrote: For something a little different, I got 狼と香辛料, which I tried to read way back and couldn't; might help bridge the gap between my forte (YA fiction, apparently) and everything else, with all the economics.
I quite like this series (I'm up to about volume 14 or so now), but it is a bit trickier than the average light novel I think. In particular the banter between the main characters can be tricky to understand because one of the core dynamics is "Horo sets a verbal trap, or expects Laurence to figure out why she's annoyed from context", so there's a lot of dialogue where people don't actually come out and say what they mean. Horo also has her own dialect with weird verb conjugations but that's not too hard to get used to. There's also the usual medieval fantasy setting specialist vocab.

Holy crap other people are reading this series too. Totally agree with what you guys have said and there's not much for me to add (I got up to volume 10 and still have the remaining to go through sitting on my shelf; it would be nice if the author STOPPED MAKING MORE so I can eventually "complete" it but yes I know Vol 17 is basically the actual end dfljbsldf). I probably did not pick a good choice for a good first series to get into but it had been a goal for a long, long time. Frankly I still have trouble understanding stuff from this series. There was a side story with some metaphor about Lawrence being a sheep from one of the earlier volumes and this was totally lost on me and I was confused why the hell they were talking about sheep so much or what relevance the conversation had to do with anything lol.

EDIT:

Here's something interesting since otk (my boring abbreviation) is the first series I started mining vocab out of (so this can somewhat loosely be taken perhaps as a vague indicator of how much overlap there is in vocab):

Vocab cards from chronological order (so vol1, vol2, etc.):
1642, 1045, 694, 588, 568, 391, 118, 111, 119, 192, 92, 97, 109, 118

Vol 15 onwards is missing because I don't have digital versions of those. There's some キノの旅 vocab and some other series I mined in between the creation of those otk cards so it's definitely not a true indicator of overlap but the plateauing trend I thought was neat.
Edited: 2017-06-12, 4:17 pm
Reply
(2017-06-12, 4:08 pm)karageko Wrote: Here's something interesting since otk (my boring abbreviation) is the first series I started mining vocab out of (so this can somewhat loosely be taken perhaps as a vague indicator of how much overlap there is in vocab):

Vocab cards from chronological order (so vol1, vol2, etc.):
1642, 1045, 694, 588, 568, 391, 118, 111, 119, 192, 92, 97, 109, 118

Vol 15 onwards is missing because I don't have digital versions of those. There's some キノの旅 vocab and some other series I mined in between the creation of those otk cards so it's definitely not a true indicator of overlap but the plateauing trend I thought was neat.

Can I ask how you mined your book? I've tried mining before but I get a lot of false positives or some really random grammar. Just a link to the tool you've used is fine.
Reply
May 15 - 26: Pretty Big Deal: Get 31% OFF Premium & Premium PLUS! CLICK HERE
JapanesePod101
(2017-06-12, 5:50 pm)uchuu Wrote:
(2017-06-12, 4:08 pm)karageko Wrote: Here's something interesting since otk (my boring abbreviation) is the first series I started mining vocab out of (so this can somewhat loosely be taken perhaps as a vague indicator of how much overlap there is in vocab):

Vocab cards from chronological order (so vol1, vol2, etc.):
1642, 1045, 694, 588, 568, 391, 118, 111, 119, 192, 92, 97, 109, 118

Vol 15 onwards is missing because I don't have digital versions of those. There's some キノの旅 vocab and some other series I mined in between the creation of those otk cards so it's definitely not a true indicator of overlap but the plateauing trend I thought was neat.

Can I ask how you mined your book? I've tried mining before but I get a lot of false positives or some really random grammar. Just a link to the tool you've used is fine.

The first couple of volumes I mined very manually using Rikaisama as I went along reading it (copy and pasting sentences and definitions into Anki directly). This was so goddamn time consuming that later I started using a script I coded myself that I never actually released to the public even though I originally intended to (there's spaghetti that needs to be purged and some better documentation that should be added); to be honest it sort of seems like some morphman-related thing would do a better job of what it tries to do (except as far as I'm aware you can't have morphman spit out sentences from the input text containing unknown words). Basically what my script does is it determines your existing vocab knowledge by looking through your Anki db and then finds all the words in an input text that are not contained in your Anki db and then spits out the sentences from the input text in the order the unknown words were discovered, as well as definitions grabbed from epwings I use (recent feature I implemented only a few months ago) and edict. A single sentence may contain more than 1 unknown word.

It's definitely not perfect because it relies on MeCab to actually parse and separate individual words correctly which it doesn't always do. Only words that have actual entries in the epwings I use or edict are considered. Even then I still have a separate text file which is newline separated containing words to ignore (because not all words are necessarily worth making an Anki card for in my opinion and the Anki db doesn't capture your actual knowledge). This ignore list is where the false positives or random grammar you're talking about might end up going. I actually still have to do a lot of manual filtering afterwards (because I want to highlight more context around an unknown word usually or words whose meanings are blatantly obvious from the individual kanji I add to the ignore list). So it's a pretty unwieldy tool until you've done a bunch of manual work to fine tune an ignore list I suppose. However, for me it is an extreme time-saver compared to what I was doing at the beginning: what would have taken probably 2 weeks or more to make 500 cards using my initial process takes me 2-3 full days filtering 1000 cards down to 500 cards (actual example from a week ago).

I might make a video or something about my process since some people ask me what I do and it's sort of clunky to try and describe but I'm not sure there is all that much utility of such a video seeing as the sole user of this script is me.
Edited: 2017-06-12, 10:30 pm
Reply