I also am concerned with this Moody person's arguments, but I find them very slight in comparison to the overall research we're seeing with regards to notions of fluid intelligence and working memory, neuroplasticity, etc. There's been other research into this type of training, I think it all establishes a compelling picture. I've never done this dual n-back task but I think it's on the right track. Did you read the other papers, or just the ones that made you feel better about not doing it? ;p It's been less than two years, so I'm hopeful.
More on Moody's paper w/ excerpts:
http://groups.google.com/group/brain-tra...e00fe9bca9
In fact, the more closely I analyze Moody's refutation in those excerpts, the shoddier it seems to me. Wait, it's not an excerpt? That's it, his whole response? No wonder they didn't bother replying. He complains about their methods very shallowly and selectively. Are you kidding me? This is what I was concerned about? I take back what I said earlier, I'm not dissatisfied with their response to Moody at all. I'm surprised he was even mentioned anywhere. ;p
You could literally pick his flawed logic apart without even referring to the original paper.
After reading the original paper again after reading Moody's 'criticism', it doesn't just seem shallow and shoddy, but even deceitful in how it misrepresents the original paper's purposes and explanations. Tsk tsk.
Edit: I just went through it and wrote responses piece by piece as if engaged in a forum style quote-argument with Moody. That was fun.
[What Jaeggi et al. reported were modest increases in performance on a
test of fluid intelligence following several days of training on a
task of working memory. The reported increases in performance are not
in question here. But the manner in which the test was administered
severely undermines the authors' interpretation that their subjects'
intelligence itself was increased.]
"Not in question" except the 'modest' label vs. 'severely' as rhetorical devices apparently disagree... vs. repeated use of 'significant', a more statistical term, by the paper, w/ numbers. I also think 'intelligence' should always be put in quotes by everyone in this field who writes like it's a concrete thing.
[The subjects were divided into four groups, differing in the number of
days of training they received on the task of working memory. The
group that received the least training (8 days) was tested on Raven's
Advanced Progressive Matrices (Raven, 1990), a widely used and well-
established test of fluid intelligence. This group, however,
demonstrated negligible improvement between pre- and post-test
performance.]
Moody implies this is because of the test difference and says it's negligible--I thought the increase wasn't in question? They say it's statistically significant--they're measuring internal changes rather than waxing rhetorical... they also say: "... subsequent analysis of the gain scores (posttest minus pretest) as a function of training time (8, 12, 17, or 19 days) showed that transfer to fluid intelligence varied as a function of training time (F(3,30) = 9.25; P < 0.001; ηp2 = 0.48; Fig. 3b)... These analyses indicate that the gain in fluid intelligence was responsive to the dosage of training."
Doesn't it make sense that the least amount of training would have the least improvement? Have they determined the minimal time, is that even the purpose of the research at that point of the study? They did another paper where they say they replicated the results of this search but with both RAVEN and BOMAT. Keep in mind this version of BOMAT is already officially called the 'short version' and 'advanced'--the way they describe timing guidelines also gives the impression that time is based on general needs of students as a part of schedules and effort and the like, rather than some kind of scientific criteria for test validity.
[The other three groups were not tested using Raven's Matrices, but
rather on an alternative test of much more recent origin. The Bochumer
Matrices Test (BOMAT) (Hossiep, Turck, & Hasella, 1999) is similar to
Raven's in that it consists of visual analogies. In both tests, a
series of geometric and other figures is presented in a matrix format
and the subject is required to infer a pattern in order to predict the
next figure in the series. The authors provide no reason for switching
from Raven's to the BOMAT.]
I wouldn't call it 'switching'--they did one or the other from the beginning. I would presume that's because the BOMAT is more recent and is more difficult, and has parallel A/B versions rather than requiring odd and even #s? I don't see how it's important unless you're comparing the RAVEN people with BOMAT people as control/experimental groups, especially in light of the dose-responsive trend for 12+ days...
[The BOMAT differs from Raven's in some important respects, but is
similar in one crucial attribute: both tests are progressive in
nature, which means that test items are sequentially arranged in order
of increasing difficulty. A high score on the test, therefore, is
predicated on subjects' ability to solve the more difficult items.]
A high score in this study is as specified in the study. "More difficult" being relative. I imagine people who compare them would say those important respects relate to the superiority of the BOMAT and restandardization, perhaps related to the Flynn effect, but I'm not sure. They say the BOMAT is more difficult, so I'm sure, if they were focusing on comparing the 8 dayers vs. the rest in a control/experimental way, which they aren't, that the easiest BOMAT would be checked against the easiest staggered #s of the RAVEN. Which would be weird and if they were doing such a study I'm sure they'd use a better methodology.
[However, this progressive feature of the test was effectively
eliminated by the manner in which Jaeggi et al. adminstered it. The
BOMAT is a 29-item test which subjects are supposed to be allowed 45
min to complete. Remarkably, however, Jaeggi et al. reduced the
allotted time from 45 min to 10. The effect of this restriction was to
make it impossible for subjects to proceed to the more difficult items
on the test. The large majority of the subjects—regardless of the
number of days of training they received—answered less than 14 test
items correctly.]
Effectively eliminated how? I think the 45 was a guideline, the site mentioned 30 as well, but I don't know French/German/whatever the language for the site is so I couldn't make out what they meant. I'm not sure the tests' internal structure is time-dependent. I'm sure future studies will (and did, actually, see end of first comment) use both decreased/increased n-levels, # of days of training, test intervals, etc.--proportionately. Plenty of research to be done.
And they say: "To keep the pre- and posttest sessions short enough, we allowed limited time (10 min) to complete the task, and the number of correct solutions provided in that time served as the dependent variable."
Also: "Carpenter et al. (1) have proposed that the ability to abstract relations and to maintain a large set of possible goals in working memory accounts for individual differences in tasks such as the Raven's Advanced Progressive Matrices test, and therefore in Gf. This ability to maintain multiple goals in working memory seems especially crucial in speeded Gf tasks because one can speed performance by maintaining more goals in mind at once to foster selection among representations. Therefore, after training working memory, participants should be able to come up with more correct solutions within the given time limit of our speeded version of the Gf task. "
[By virtue of the manner in which they administered the BOMAT, Jaeggi
et al. transformed it from a test of fluid intelligence into a speed
test of ability to solve the easier visual analogies.]
They did? How so? + Easier compard to what? The later questions aren't factored in that way. See above.
[The time restriction not only made it impossible for subjects to
proceed to the more difficult items, it also limited the opportunity
to learn about the test—and so improve performance—in the process of
taking it. This factor cannot be neglected because test performance
does improve with practice, as demonstrated by the control groups in
the Jaeggi study, whose improvement from pre- to post-test was about
half that of the experimental groups. The same learning process that
occurs from one administration of the test to the next may also
operate within a given administration of the test—provided subjects
are allowed sufficient time to complete it.]
Again, more difficult compared to what? Also, is Moody arguing that the restriction exemplifies how what's being tested isn't the test-taking ability? Good, that's something that's been noted by those critical of these intelligence tests before and what I think this study goes a long way towards overhauling, paradigmatically, and they explicitly take that into consideration: "To control for the impact of individual differences and gain in working memory capacity, a digit-span task (38), as well as a reading span task (39), was used in the pre- and postsession. However, the reading span task was not assessed in the 8-day group. "
Plus: "Examining the transfer task in terms of the processes involved, there is evidence that it shares some important features with the training task, which might help to explain the transfer from the training task to the Gf measures. First of all, it has been argued that the strong relationship between working memory and Gf primarily results from the involvement of attentional control being essential for both skills (22). By this account, one reason for having obtained transfer between working memory and measures of Gf is that our training procedure may have facilitated the ability to control attention. This ability would come about because the constant updating of memory representations with the presentation of each new stimulus requires the engagement of mechanisms to shift attention. Also, our training task discourages the development of simple task-specific strategies that can proceed in the absence of controlled allocation of attention. "
[Since the whole weight of their conclusion rests upon the validity of
their measure of fluid intelligence, one might assume the authors
would present a careful defense of the manner in which they
administered the BOMAT. Instead they do not even mention that subjects
are normally allowed 45 min to complete the test. Nor do they mention
that the test has 29 items, of which most of their subjects completed
less than half.]
They probably found it irrelevant. + See above. But hopefully we'll start seeing plenty more research now that the 'fixed' paradigm has been busted in yet another way.
[The authors' entire rationale for reducing the allotted time to 10 min
is confined to a footnote.]
Really, their entire rationale? Seems like they repeatedly explained it to me.
[That footnote reads as follows:
Although this procedure differs from the standardized procedure,
there is evidence that this timed procedure has little influence on
relative standing in these tests, in that the correlation of speeded
and non-speeded versions is very high (r = 0.95; ref. 37).
The reference given in the footnote is to a 1988 study (Frearson &
Eysenck, 1986) that is not in fact designed to support the conclusion
stated by Jaeggi et al. The 1988 study merely contains a footnote of
its own, which refers in turn to unpublished research conducted forty
years earlier. That research involved Raven's matrices, not the BOMAT,
and entailed a reduction in time of at most 50%, not more than 75%, as
in the Jaeggi study.]
I'm pretty sure their reference to 'relative standing' and speeded vs. non-speeded was related to that area of research that compares mental chronometry and inspection time (terms used by sources in question) with the viability of measuring fluid intelligence and new notions of such processes. As stated above, they have their own purposes and if you're going to question the amount of time and progressive nature of the tests, I'm sure it's relevant in some way for testing other things, and the BOMAT people could explain why they have both 30 minutes and 45 minutes and progressive difficulty and how it relates to testing integrity overall.
[So instead of offering a reasoned defense of their procedure, Jaeggi
et al. provide merely a footnote which refers in turn to a footnote in
another study. The second footnote describes unpublished results,
evidently recalled by memory over a span of 40 years, involving a
different test and a much less severe reduction in time.]
Nitpicking over irrelevancies with rhetorical flourishes. See above. Also see other responses in that Google Groups thread, including Granstrom's later comments in mouse working memory thread on that Google Groups site.
[In this context it bears repeating that the group that was tested on
Raven's matrices (with presumably the same time restriction) showed
virtually no improvement in test performance, in spite of eight days'
training on working memory. Performance gains only appeared for the
groups administered the BOMAT. But the BOMAT differs in one important
respect from Raven's. Raven's matrices are presented in a 3 × 3
format, whereas the BOMAT consists of a 5 × 3 matrix configuration.]
"Virtually none" now, instead of "negligible"? They showed the least improvement--is it because--I mean "in spite of"--8 days is the minimum? Is the least group compared to the 12+ days groups in a versus equation? Given replications of their results and similar results with other tests, I'm thinking the latter rather than some nefarious plot or severe gaps in logic involving different types of testing. "Presumably" -- They clearly state the time restrictions were the same, no?
[With 15 visual figures to keep track of in each test item instead of
9, the BOMAT puts added emphasis on subjects' ability to hold details
of the figures in working memory, especially under the condition of a
severe time constraint. Therefore it is not surprising that extensive
training on a task of working memory would facilitate performance on
the early and easiest BOMAT test items—those that present less of a
challenge to fluid intelligence.]
Really, not surprising? They go into that whole contested area of how working memory and fluid intelligence relate, and explain in lengthy detail how the tasks relate to attentional control and the like, yes? That's a rich field of research, and there's at least two studies cited at Wikipedia on working memory that discuss irrelevant information and distractors, as well as new research at Science Daily and the like on focus and attention that's ever-present in RSS feeds around the world. I'm not sure what Moody's on about with regards to the 'easiest' and 'less of a challenge to fluid intelligence'--in relation to what? The measure is of transfer effects and gains in performance within the study. I'm assuming continuing research will take broader approaches. Right now I'm also surprised there hasn't been more criticism of this study--how many years before someone with vested interest in genetic, fixed IQ will try to debunk them more robustly, vs. the eager beaver software developers wanting to market an IQ boosting gimmick, subverting a good bit of research for their own purposes?
[This interpretation acquires added plausibility from the nature of one
of the two working-memory tasks administered to the experimental
groups. The authors maintain that those tasks were “entirely
different” from the test of fluid intelligence. One of the tasks
merits that description: it was a sequence of letters presented
auditorily through headphones.]
Yes, they say 'entirely different' in the introduction and explain as I previous quoted and this whole section of paragraphs such as this one: "Operationally, we believe that the gain in Gf emerges because of the inherent properties of the training task. The adaptive character of the training leads to continual engagement of executive processes while only minimally allowing the development of automatic processes and task-specific strategies. As such, it engages g-related processes (5, 17). Furthermore, the particular working memory task we used, the “dual n-back” task, engages multiple executive processes, including ones required to inhibit irrelevant items, ones required to monitor ongoing performance, ones required to manage two tasks simultaneously, and ones required to update representations in memory. In addition, it engages binding processes between the items (i.e., squares in spatial positions and consonants) and their temporal context (30, 31). " Or:
"However, our additional analyses show that there is more to transfer than mere improvement in working memory capacity in that the increase in Gf was not directly related to either preexisting individual differences in working memory capacity or to the gain in working memory capacity as measured by simple or complex spans, or even, by the specific training effect itself.
Therefore, it seems that the training-related gain on Gf goes beyond what sheer capacity measures even if working memory capacity is relevant to both classes of tasks. Of course, tasks that measure Gf are picking up other cognitive skills as well, and perhaps the training is having an effect on these skills even if measures of capacity are not sensitive to them. One example might be multiple-task management skills. Our dual n-back task requires the ability to manage two n-back tasks simultaneously, and it may be this skill that is common to tasks that measure Gf. Our measures of working memory capacity, by contrast, index capacity only for simpler working memory tasks that are not so demanding of multiple-task management skills. So, sheer working memory capacity alone may be an important component of measures of Gf, but beyond this capacity, there may be other skills not measured by simpler working memory tasks that are engaged by our training task and that train skills needed in measures of Gf."
[But the other working-memory task involved recall of the location of a
small square in one of several positions in a visual matrix pattern.
It represents in simplified form precisely the kind of detail required
to solve visual analogies. Rather than being “entirely different” from
the test items on the BOMAT, this task seems well-designed to
facilitate performance on that test.]
Not exactly. See above. I don't think inferences in visual analogies and recall of multiple components as in the dual n-back are all that similar. However, they've also found single n-back effective, see first comment. But yes, I do think we'll find that test-taking measures test-taking and where it doesn't, it involves trainable working memory type skills. Physical capacity limits of the brain as influenced by environment and heritability have little variation between individuals compared to that, from what I've seen, and I think by the time we reach and can isolate and measure those capacities in real time in relation to countless other variables that practically influence intra-individual variation, we'll be well beyond rhetoric and notions of 'fixed' anything.
Personally I'm more interested in the idea of applying multitasking exercises to what individuals actually intend to study/do in life, re: multimodal integration.