For English, the general recommendation is only 2-5% of the material contains unknown terms (and the difference between those levels seems high; I believe I read that in general for written English, without tailoring your materials to your level, this requires something like knowing 8000-9000 word families (~35,000 words). (For a learner.) The difference in that 3% is big, though, like thousands of word families.
Thought I'd quote this:
“Taken together, the research confirms that worthwhile vocabulary learning
does occur from reading. However, the pick-up rate is relatively low, and it
seems to be difficult to gain a productive level of mastery from just exposure.
Hill and Laufer (2003) estimate that, at the rates of incidental learning
reported in many studies, a L2 learner would have to read over 8 million
words of text, or about 420 novels to increase their vocabulary size by 2000
words. This is clearly a daunting prospect, and thus it is probably best not to
rely upon incidental learning as the primary source of the learning for new
words.
Rather, incidental learning seems to be better at enhancing know-
ledge of words which have already been met. This conclusion is congruent
with Waring and Takaki’s (2003) findings that reading graded readers does not
lead to the learning of many new words, but that is very useful in developing
and enriching partially known vocabulary. Studies with a variety of test types
have shown that exposure leads to improvements in multiple types of word
knowledge. Also, given that repetition is key to learning words, the benefits of
repeated exposures in different contexts for consolidating fragile initial learn-
ing and moving it along the path of incremental development cannot be
underestimated.”
- Instructed second language vocabulary learning (Norbert Schmitt)
That's mostly for ungraded materials without supplementation with glossing and frequency-related design and such. Overall for incidental/extensive without explicit, graded doesn't fare much better, but in terms of graded vs. ungraded extensive/incidental:
“... One way of incorporating incidental learning into a language program is to
organize an extensive reading component (Day & Bamford, 1998). Although
readers need to know 98–99% of the words in a text, many authentic texts will
still be suitable for more advanced learners, especially if teachers provide sup-
port for the more difficult vocabulary (see below). However, for developing
learners, the vocabulary load will probably be too high in authentic texts, and
so the use of graded readers is recommended, as the vocabulary load is both
fine-tuned for the learner’s level, and systematically recycled (Nation &
Wang, 1999; Al-Homoud, 2007).
Graded readers used to have a bad reputa-
tion for being boring and poorly written, but that is no longer the case, with
several major publishers providing a series of interesting and well-presented
readers. Most importantly, research shows that substantial vocabulary learn-
ing can be derived from graded readers. For example, Horst (2005) found
that her participants learned over half of the unfamiliar words they encoun-
tered in the graded readers they read.”
Another nice bit somewhat related to Nation's meaning-focused vs. language-focused Four Strands paper:
“... there are good reasons to believe that vocabulary requires a different approach
which incorporates explicit attention to learning the lexical items themselves:
• learners who understand the overall message often do not pay attention
to the precise meanings of individual words
• guessing from context is often unreliable, especially if the learner does
not know 98% of the words in the discourse
• words which are easily understood (guessed) from context may not
generate enough engagement to be learned and remembered
• new words which learners have met in discourse need to be met again
relatively quickly to avoid their being forgotten. In order for words to
be met 10 times in reading, learners would need to read 1–2 graded readers
per week. The typical learner simply does not read this much.
(Laufer, 2005)”
Of course, we have all sorts of tools to blend it all together now, especially with direct card creation. ^_^ Interestingly, they seem to find that post-task explicit learning of words encountered in reading is very good, so that works well with reading followed by SRSing words you added with Rikaisan. They also found that adding audio is better than reading without, and both of those are better than just listening.
Edit: It also seems that with shorter texts with fewer total unknown words (e.g. 750 running words [word count of 750]), 1 unknown in 15 running words rather than 1 in 50 seems to fare better. Again, this being contextual inference in English, without glosses in the margins, etc.
Edited: 2011-06-22, 1:53 pm