Is it testing, “retrieval practice,” or writing?

“To Really Learn, Quit Studying and Take a Test”, declares the alluring and misleading headline of a NY Times article by Pam Belluck, describing the findings of yet another paper on teaching and learning published by Science, on-line in Science Express (Karpicke and Blunt, 2011). However, the study described has nothing to do with testing as a method to improve learning.

The title of the Science Express paper is  “Retrieval practice produces more learning than elaborative studying with concept mapping.” The study by Jeffrey Karpicke and Janelle Blunt is actually about determining what  study methods work better. I found their paper compelling reading, first because they tested college students (Purdue undergraduates) learning real concepts in biology, and secondly because they rigorously tested concept-mapping, a widely accepted and promoted  “active-learning” strategy, versus what they call “retrieval practice,” which turns out to be writing.

For some time now, advocates of science teaching reform have promoted “Active Learning” as an alternative to straight lectures with Powerpoint slides. I have joined this reform movement, because I hate seeing students in lecture halls sitting passively, if not nodding off, chatting with their neighbors, texting, or otherwise multitasking on their laptops. Few students these days bother to take notes. As a teacher, I want to get my students actively engaged with the material, thinking, talking, writing, and drawing science. Many studies have shown that such active learning methods produce better outcomes than lecture-only. What I dearly want to know, is whether some methods work better than others, and how I can advise students to get the most out of the time they do spend studying.

Karpicke and Blunt report results from two experiments. Experiment 1 was a 4-way design, where 4 different groups of 20 students each read a 276-word biology text on “Sea Otters.” All the groups had 5 minutes to read this text. The control “Study” group had no further study time. The “Repeated Study” group was given 3 additional 5-minute study periods to re-read the same text, with 1-minute breaks in between, so these students studied the text a total of 4 times. The “Concept Mapping” group, after they had read the text for 5 minutes, were instructed briefly in concept mapping, with an example of a concept map, then given 25 minutes with a sheet of paper to construct a concept map, while being allowed to consult both the text and the example map. The “Retrieval Practice” group, after the initial 5 minutes to read the text, were  then given 10 minutes to type into a text box on the computer screen all the information they could remember from the text. They then re-read the original text for 5 minutes, then repeated the 10-minute recall typing exercise. Thus the “Concept Mapping” and “Retrieval Practice” groups had equal amounts of total learning and study time (30 minutes), but more than the “Repetitive Study” group (20 minutes), and the “Study” group (5 minutes). Assessments of the concept maps and the written recalls showed that the concept maps captured 78% of the conceptual information, and the first and second recalls captured 64% and 81%, respectively. Therefore, there was no significant difference at the end of the learning phase between the concept maps and the second recall writings.

A week later, the students took a paper-and-pencil test on this material, with 14 short answer questions to test verbatim recall (Figure 1A) and 2 questions to test whether students could make inferences based on the concepts (Figure 1B).  The results showed that Retrieval Practice produced the highest test scores, by a significant margin. What is surprising, and perhaps shocking to the faculty who like to use concept mapping in their classes, is that students who did the Concept Mapping fared no better than students who spent less time with the most boring study method – repetitively re-reading the same text.

One might expect that students who did concept-mapping might fare best on inferential questions, which require students to mentally connect concepts, but Figure 1B shows the same pattern of performance as in Figure 1A. What I find remarkable is that students who did the concept mapping were the most confident about their learning – after the study period, all students were asked to predict how much they would remember, and those who did concept mapping thought they would remember the most (Figure 1C). In contrast, the students who did the Retrieval Practice were even less confident about their learning than even the control Study students, but outperformed all other groups. This clearly shows that students are not good judges of their own learning, and that student surveys are not reliable indicators of what methods are most effective at promoting student learning.

So were these results a fluke of relatively small sample size, given that each group had only 20 students? The authors used quite a different strategy for Experiment 2. Here they used a larger cohort (120 students), a head-to-head comparison of Concept Mapping vs Retrieval Practice, and a within-subjects design. Each student read 2 texts. With one of the texts, they used Retrieval Practice. With the other text, they used Concept Mapping. One additional factor thrown in was the kind of information: sequence text presented concepts that were structured like steps in a sequential process, such as “Digestive System”; and enumeration text presented concepts where order was unimportant, such as “Makeup of Human Blood”.  Would Concept Mapping be superior for learning sequential information?

The actual studying procedures were similar to Experiment 1. Remember that the difference now is that each student used both study methods; Concept Mapping for one text, Retrieval Practice for another text. Again, the concept maps and the written recalls were evaluated at the end of the learning phase. This time the concept maps captured significantly more information (74%), than either the first or second recall writings (46% and 65%, respectively).

The testing phase a week later added a significant new wrinkle: half the students were asked to create a concept map (without looking at the text), and half the students were asked short-answer questions as before.

What is remarkable about the test results is their consistency. Retrieval Practice is superior, even when students are tested by Concept Mapping. The same student, performed better on the subject he or she had studied by Retrieval Practice, than on the subject he or she had studied by Concept Mapping. It didn’t matter if the subject was on sequential information or enumerative information. It didn’t matter if the test was to construct a paper-and-pencil concept map or type short written answers.  Students who had studied a text by Concept Mapping did not do as well at reconstructing the concept map a week later, than students who had studied the same text by Retrieval Practice.

I can’t think of any way around the conclusion: Retrieval Practice is better than Concept Mapping. Why?

Concept Mapping is popular among active-learning enthusiasts because it engages students to make mental constructs. As performed in these experiments, it is “elaborative” study – students work with the text to add their own interpretation. Retrieval Practice seems little different from rote recall, a seemingly less interesting or lower form of active learning. But clearly something powerful happens during Retrieval Practice that promotes long-term learning.

The authors cite evidence that the act of free recall (recalling in the absence of specific prompts) forces the subject to  organize a mental “retrieval structure” and then sort and recover individual concepts within the structure. The practice part allows the subject to see the gaps in the structure, and fill them in.

As someone who has always despised memorization, but is sometimes fond of writing, I wonder if the act of writing (typing into a text box) is part of this learning effect. Would Retrieval Practice work as well if the subjects orally recited their recall? What if the Concept Mapping had been performed in a way similar to Retrieval Practice – a Retrieval Concept Mapping, where students read the text, make a concept map from recall, re-read the text, and revise their concept maps, again from recall?

There’s been a few recent papers about how brief writing exercises can eliminate the gender gap in physics (Miyake et al. 2010), enhance academic performance among black students (Cohen et al. 2009), and alleviate test anxiety (Ramirez and Beilock 2011). While the student writing in these other papers are not germane to any academic subject matter, they nevertheless demonstrate that the act of writing can have powerful, long-lasting effects. Would these studies have seen the same effects if the subjects had talked about their values or anxieties rather than writing them? Does writing affect our brains in a special way?


Karpicke, JD and JR Blunt 2011. Retrieval Practice Produces More Learning than Elaborative Studying with Concept Mapping. Science DOI:10.1126/science.1199327

Cohen, GL, J Garcia, V Purdie-Vaughns, N Apfel, P Brzustoski 2009. Recursive Processes in Self-Affirmation: Intervening to Close the Minority Achievement Gap. Science 324: 400-403

Ramirez, G and SL Beilock 2011. Writing about Testing Worries Boosts Exam Performance in the Classroom. Science 331:211-213

Miyake, A, LE Kost-Smith, ND Finkelstein, SJ Pollock, GL Cohen, TA Ito 2010. Reducing the Gender Achievement Gap in College Science: a Classroom Study of Values Affirmation. Science 330:1234-1237

About jchoigt

I'm an Associate Professor in the School of Biology at Georgia Tech, and Faculty Coordinator of the Professional MS Bioinformatics degree program.
This entry was posted in Teaching and learning biology and tagged , , , , , . Bookmark the permalink.

9 Responses to Is it testing, “retrieval practice,” or writing?

  1. David Garton says:


    Thanks for calling attention to these papers; I enjoyed reading them and the results are quite intriguing. Two points I’d like to make,

    1) Retrieval studying methods clearly yield the best performance on the assessment tool (a test). An obvious question is how Concept Mapping and Retrieval Method might compare in a more open-ended, self-investigative learning environment: would Concept Mapping provide a better framework for organizing and finding less-than-obvious linkages? Harder to assess perhaps, but still an important skill to master.

    2) Can these results be interpreted that “old school” teaching methods (i.e. memorize the book) often derided as merely “teaching by rote” are actually more effective in promoting learning than is generally believed? Or does this simply mean you do better on tests that favor recall and short-term memory? It would be worthwhile to consider follow-up studies examining retention of information among learning methods.

  2. jchoigt says:

    Hi Dave, thanks for your comments. I agree that Concept Mapping may be short-changed in this study. I wonder if it would do better with more complex topics, and if students are asked to make explicit connections to other concepts from their prior knowledge, previous topics, topics from previous courses, etc.

  3. Tod Duncan says:


    As eloquent and insightful as ever; thanks for the summary. This paper has been sitting on my desk…

    One thing that strikes me is that the assessment sounds ‘knowledge’ based (excepting of course when inference was evaluated.)

    I need to think more on the implications for higher levels of Bloom’s – with mastery of higher level skills being something that is, to me, significantly more valuable than ‘knowledge retention’ at the level of student I am working with (first semester biology majors). With information so abundant today relative to even when I was an eager Freshman with a twinkle in my eye, what value do we need to continue placing on recall of knowledge? Do we want to know the best way to promote knowledge retention simply because that is what we, rightly or wrongly, value in our educational systems?

    I have no doubts that for me, repetition, repetition, repetition, and frequent testing and re-testing is what drives my retention of knowledge. If I created a concept map, and then re-did it, re-did it, re-did it, I am pretty sure that I would learn more than one instance of that exercise. Have the authors shown that there is a difference between the study mechanism – ‘retrieval practice’ versus concept mapping – or repetition of one (retrival practice – heavens, you got to PRACTICE it, no wonder you learned more) versus non-repetition of the other (which is thus inherently more passive as the students stare at their first attempt at a concept map)? I wonder.

    David: I would concur with your statements in 2) above; the mechanism of learning needs to align with the learning objective. I don’t use concept mapping to help students with knowledge recall. If the two don’t align, ‘learning’ – whatever that looks like – probably won’t occur.

    Anyway, great summary and food for thought; thanks again.It would be great to see a discussion/presentation session (e.g. with you as a leader!) of a couple of papers like this at an event such as BLC…Robin? Maybe even a round-table kind of thing.


  4. This is a fairly thorough restatement of the paper, which is unfortunately hidden from many readers behind Science’s expensive subscription.

    My comments on the original article are at

  5. Jung,

    Thanks for reviewing the Karpicke & Blunt article in Science and also for pointing out the NY Times article. I am a retired biologist who still teaches online for The University of Texas at El Paso and just completed a year teaching as a Visiting Assistant Professor at Washington College in Chestertown, MD. I have slowly backed into the world of teaching pedagogy and a few weeks ago gave a presentation at The Conference on Higher Education Pedagogy at Virginia Tech titled “Linking Online Formative Assessment With Study Time And Student Learning”. I saw the article in Science and read the NY Times article a week before my presentation and was delighted because it seemed that it supported my way of using practice quizzes which I call Quizlets. I studied student performance in three courses I taught at Washington College – General Biology, Cell Biology and Developmental Biology, using data collected in the Blackboard grade book. In short, the data showed that students who did more Quizlets, got better scores on exams than students who did them less or not at all. The Quizlets are taken as 10 question tests with questions randomly selected from pools of questions numbering 50 – 80 per chapter and can be taken as many times as they like. They also do not count toward their final grade. In addition, I could show that students improved their score on Quizlets the more they took them and what really, quite frankly, blew me out of the water, was being able to show a moderate to high correlation using R-squared calculation between the score students got on their last five practice Quizlets and the score they received on the lecture exam which did count toward their grade. If you are interested, you can check out the graphics for my presentation at (give the presentation 30 – 60 seconds to download completely as it is a large Flash file) and I would love to hear your thoughts about this approach. As I mentioned earlier, I have been slowly moving into what might be described as teaching scholarship, i.e., trying something new in the classroom, measuring its effectiveness, presenting at meetings and eventually publishing in peer-reviewed journals, so any help you can provide as to the significance of this type of formative assessment, especially in a climate of science instruction which emphasizes critical thinking and making sure that students understand how science is done, would be very helpful to me. I do not disagree with what I read about STEM strategies for learning but wonder what the role of memorization is and forms of learning such as the Quizlets that I have studied is in this new vision of learning provided through the lens of retrieval practice?

    Thanks again for bringing this topic to our attention and thanks for any help you can provide me in finding a home for Quizlets as a useful formative assessment tool in STEM education.


  6. Nancy Pelaez says:

    Hi, Jung and all,

    karpicke, who is my colleague at Purdue, says The NSF press release was very good and accurate but he is not too happy with the NY Times article — and wonders why they keyed in on “testing” as the investigation was about retrieval practice, so for a better report, see
     – National Science Foundation (NSF) News – Science … 

  7. I am a big believer in, “See one, do one, teach one.” I would like to have seen a test group asked to summarize or teach sections of the text to other members of the group, after which a full summary would be collated amongst all members. As a Science article from April, 2011 suggests, teaching improves graduate student research because it makes one think about the concepts involved and places important details in context with important concepts.

    The study is also biased in that it does not take into account prior interest, prior knowledge, or funds of knowledge. Each of these items can be utilized in a group activity, to increase the learning of the entire group, not just an individual.

    Information retrieval is not learning, merely memorizing – we should be striving for more.

    To this end, a new blog, As Many Exceptions as Rules ( tries to use interesting exceptions to biological rules to increase student interest while at the same time reinforcing core concepts in biology. Each post includes a list of sites for further information and possible classroom activities and labs. The blog is meant to be a starting point or adjunct , not a curriculum, so student a teacher participation in how the topics relate to classroom topics is crucial.

    Recent topics have included biological concepts of heterotrophs/autotrophs, evolution, and horizontal gene transfer that are involved with a photosynthetic sea slug. Also, a story of megabacteria relates the rules of size limitations place on microorganisms by diffusion kinetics. A current story relates sense organs, otoacoustic emissions, and body symmetry in owls.

    A future story will include rules on body heat generation, circadian rhythms, and a plant that can regulate, not just increase its temperature. I invite science geeks, teachers and students to check out this blog.

  8. grepinsight says:

    Dear Jung,

    Thank you very much for nicely summarizing the papers! The findings behind these papers makes me think about a lot of things, and particularly in the domain of foreign language learning, I have been learning Spanish on my free time via Duolingo, and this has been working super well for me and surpasses any other methods I have tried in terms of my retention and actually using the language. I have been curious about its effectiveness, and now I can see why.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s