![]() |
|
Sentences galore? - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: Sentences galore? (/thread-2523.html) |
Sentences galore? - cloudstrife543 - 2009-02-02 http://www.mahou.org/Kanji/ Try out this site if you are sentence mining. Especially if you want kanji usage context. I just found it and figured I'd share and get opinions of others. Just type in a kanji and hit search and they provide various sentence examples along with definition, stroke order, blah blah, etc. I know people are always concerned about certain sentence mining sources not being 'natural Japanese', and that is another reason why I posted it here to get some of your opinions after you've checked it out. Thank you for any comments! Sentences galore? - Smackle - 2009-02-02 They are all from the Tanaka Corpus which is sometimes unnatural Japanese. Sentences galore? - GoodSirJava - 2009-02-02 The proper term for this, in case it interests anyone, is KWIC (Key Word in Context), and it is used by lexicographers to construct dictionaries. Sentences galore? - Tobberoth - 2009-02-02 If it's the Tanaka corpus, I wouldn't use it. Tons of the sentences are unnatural and many of them aren't even correct. Sentences galore? - howtwosavealif3 - 2009-02-02 galore? it has anime all -over. pretty obvious it's not very good. Sentences galore? - cloudstrife543 - 2009-02-02 why does having anime on it make it bad? Wouldn't you sometimes get sentences from mining your favorite anime? Sentences galore? - cloudstrife543 - 2009-02-02 Smackle Wrote:They are all from the Tanaka Corpus which is sometimes unnatural Japanese.Where does it tell you this information? Sentences galore? - woodwojr - 2009-02-02 Dictionary Information: #12: Japanese-English sentence pairs: "This is the big file of matched Japanese-English sentence pairs from the Tanaka corpus. It is the file as used by the WWWJDIC server." That doesn't conclusively indicate that all sentence pairs are from the Tanaka corpus, but it certainly contains it. ~J Sentences galore? - cloudstrife543 - 2009-02-02 Smackle Wrote:They are all from the Tanaka Corpus which is sometimes unnatural Japanese.I found where it says that, but it is the file loaded onto the WWWJDIC server which they say has been updated regularly to fix the sentences that were messed up but they caution that some might still be messed up. So... I guess it's not 100% reliable. Sentences galore? - Delina - 2009-02-02 I use mahou a lot, but never for sentence-mining. It's convenient in that it has a great cross-reference for the kanji, including its frame number in Heisig and a bunch of other dictionaries and learning resources (KO2001, etc.). Sometimes I'll use the example sentences for reference when I'm trying to write something, but because they are translated from English by American college students, I do not use them as examples of 'real' Japanese for my SRS. (If you want sentences galore, Tanuki seems to be a better bet, but does not provide English translation.) Here is an excerpt description of the origin of these sentences: "From inspection, it appears that many of the sentence pairs have been derived from textbooks, e.g. books used by Japanese students of English." For the full description, see the link below: http://www.csse.monash.edu.au/~jwb/tanakacorpus.html You are correct that it has been modified for use in WWJDIC, but note that the bar for removal of 'bad' sentences is set pretty low - only grammatically "wrong" sentences have been removed, with no regard to whether they are "natural" Japanese sentences. Generally they are high on pronouns and read like something out of an old textbook: "As described below, the Tanaka Corpus has been edited and adapted to be used within the WWWJDIC dictionary server as a set of example sentences associated with words in the dictionary. In order to adapt the corpus for this role, it has been edited as follows: 1. an initial regularization of the punctuation of the Japanese and English sentences was carried out, then duplicate pairs were removed, reducing the original file from 210,000 pairs to 180,000 pairs; 2. sentences which differed only by differences in orthography (e.g. kana/kanji usage, okurigana differences), numbers, proper names, minor grammatical points such as plain/polite verb usage, etc. were reduced to single representative examples; 3. sentences where the Japanese consisted of a short Japanese statement in kana were removed; 4. sentences with spelling errors, kana-kanji conversion errors, etc. were corrected; 5. sentences where the English version did not match the Japanese were edited to make the two versions agree; 6. where the sentences contain gender-specific language or words, the English portion has been tagged with [M] or [F] respectively; 7. sentences where the Japanese was too garbled to derive a valid English equivalent were removed." |