Veronica Yan: More on Learning and Desirable Difficulties

Veronica Yan, a PhD student in the Bjork cognitive psychology lab at UCLA, gave a talk tonight at Pearson’s Mastering Leadership Conference about different study practices and student learning. I heard Robert Bjork talk about “desirable difficulties” a couple of years ago at Pearson’s Biology Leadership Conference, and it transformed the way I teach and how I advise students to improve their study practices. The research findings, that spaced study is superior to massed study, that testing (retrieval practice) is superior to repeated study, and that interleaved study is superior to blocked study, are all counter-intuitive, but robust and reproducible in multiple contexts. Robin Heyden summarized Robert Bjork’s talk very nicely in her blog post  so I won’t repeat those points here. In any case, the NY Times also ran a story in 2010.  Veronica Yan summarized some of the same work, and expanded into new but related territory.

Multiple Choice Questions

After relating the results that showed the remarkable effects of retrieval practice (repeated testing) on long-term memory, Veronica discussed multiple choice tests, a “necessary evil” for those of us who teach large classes. Poorly designed multiple choice questions, with non-competitive distractors (answers that are clearly wrong), encourage students to answer via pattern-recognition, and do not elicit the cognitive benefits of retrieval practice. However, multiple-choice questions with competitive distractors force students to engage in ways that benefit recall of related information. Students given multiple choice questions with competitive lures performed better on a subsequent cued recall test of related information, than students given questions with non-competitive lures. This improved recall of related information occurred even if the students answered the original multiple choice questions incorrectly.

Errorless Learning

This brings up the fascinating idea that making mistakes can enhance learning, as long as the mistakes are corrected. Veronica Yan described an experiment where some subjects are given 13 seconds to study a word association, such as whale: mammals. Another group of subjects are given 8 seconds to guess at the association of whale: ___?, then shown whale: mammals for 5 seconds. In later tests, the group that initially guessed at alternative associations (such as whale:dolphin) did better at remembering the correct associations that those who had studied just the correct associations.

Productive failure

She then discussed research by Kapur and Bielaczyc (2012) on “productive failure.” Groups of students in 3 Singapore schools worked on complex problems. The “productive failure” groups worked for 6 periods with no instruction or assistance from their teacher, and then received 1 period of instruction. These students got 0%, 7%, and 16% correct solutions to their complex problems. The “traditional instruction” group received 7 periods of directed instruction from their teacher, and achieved 91%, 93%, and 92% correct solutions. But on a post-test, where students were given 3 new well-structured problems, 1 complex problem, and 1 graphic representation problem, the “productive failure” group performed much better!

Veronica Yan compared this to the common Japanese teaching practice, where students work on complex problems and write out their incorrect solutions for all to see, and receive diagnostic correction. Exploring mistakes and receiving corrective feedback appears to be a powerful way of learning.

Attitudes and assumptions that impede learning

Veronica Yan concluded by suggesting that students and teachers need to change attitudes toward mistakes and learning. Massed and blocked study feels more effective, and yields better short-term results. More effective study that is spaced and interleaved feels more difficult, and takes more effort, but yields better long-term results. Indeed, if learning feels easy, then you may not really be learning.

I thought about my own students’ mixed reactions to my efforts to institute “desirable difficulties” in my classes. Some get angry that I’m not teaching, and say so in their course evaluations. Others get discouraged when they feel confused, and give up. I fear that in making mistakes, they say to themselves, “I’m just bad at this” or “this must not be the right subject for me”. I fear even more that making mistakes may reinforce any stereotype threat. I raised these concerns with Veronica after her talk, and she suggested that we need to find ways to help students change their attitudes and be better informed about how they learn best.

Posted in Teaching and learning biology | Tagged , , , | 1 Comment

Flipping a large Intro Bio class – round 2

After reviewing results and student responses to my first iteration of flipping my large Intro Biological Principles class last fall, I made a few revisions for this go-round.

The #1 student complaint from last fall was that watching the 30+ minutes long lecture videos before class took additional time. My own observations were that too many students came to class unprepared, and that watching the videos on-line could not be any more engaging than listening to a live lecture. On the other hand, other students really liked the flexibility of the lecture videos.

I also think I made a strategic mistake in making too much of a big deal out of flipping the class, as a wholly new experiment, and that I would not lecture. It added to a sense of melodrama and fueled complaints from some students that I was depriving them of live lectures.

So I made a few changes.

1. Scaffolding with an un-textbook: I would provide students with the essential concepts distilled to a web page, with links to additional material. I hope that students will see this as a service, and that if they can largely substitute the web page for the textbook reading, they will feel like this new format saves them some time.

2. No more 30+ minutes long lecture videos. Some concepts outlined on the un-textbook web page have embedded lecture videos, split into 5-6 min segments explaining just that one key concept. The maximum length of the videos is about 11 minutes, but most range 5-6 minutes.

3. Flipping with stealth: I am not making any announcement about flipping the class. Instead, I am taking the approach that this is a normal, standard way to teach. Perhaps some students won’t even notice that I’m not lecturing.

4. Explaining the intent of each part of the class. I begin with clicker questions largely based on the assigned reading (web page), calling it “retrieval practice,” explaining that studies have shown that the very attempt to remember something helps learning. I tell them that I examined their performance on the on-line homework (Mastering Biology), and that the day’s in-class activities will help them with the concepts that they found most challenging. I also show them where they could have found the information to answer the most difficult homework questions.

5. Adjusting some in-class activities to better suit students. Again, based on student reaction and performance from the first time, I revised some of the activities to better suit the time available and the level of student understanding.

That’s my 5-point plan.

Posted in Teaching and learning biology | Tagged , , | 2 Comments

What might geology be telling us about the fossil record?

One of the themes of our evolution module is that the geosphere and biosphere co-evolved throughout Earth history. Evolution of life profoundly affected the geochemistry of the planet, and changing geological conditions in turn repeatedly stirred the pot of evolution. Ars Technica has a couple of nice stories on the gaps in the geological and fossil records.

The first deals with the question of how faithfully the fossil record tells the history of life. Evolutionary biologists from Darwin onwards have fretted about gaps in the fossil record, and creationists have seized on these gaps to state that lack of transitional forms falsifies evolutionary theory. Clearly, the fossil record depends on the deposition of sedimentary rocks and how well different types of organisms, particularly those with only soft body parts, become fossilized.

Scott Johnson’s article “The rock record got a bad rap. Fossil diversity accurately reflects history” discusses a Science paper by Hannisdal and Peters (2011), showing that both the rate of sedimentary rock formation and fossil diversity depend on environmental changes. The same factors such as sea level changes affect both global sedimentation rates and biological diversity. Therefore, the fossil record may be a more accurate indicator than previously thought of the history of life’s diversity.

An example of common-cause relationships among an environmental parameter (sea level), the rock and fossil records, and biodiversity. Arrows denote the direction of causality; “+” and “−” signs indicate measured or inferred positive and negative relationships. From J Crampton 2011 Science 334:1073-1074

The second article by Scott Johnson, “Missing rocks may explain why life started playing shell games” addresses the Cambrian explosion and the paucity of the fossil record preceding this period of great diversification of animal life. The geological record has a globally widespread gap, called the “Great Unconformity.”

Stratigraphy of the Grand Canyon, Wikimedia Commons, as shown in Scott Johnson’s article, “Missing rocks may explain why life started playing shell games.”

Johnson discusses a paper in Nature by Peters and Gaines (2012), that both the Great Unconformity and the Cambrian explosion could be explained by sea level rise.

Fig. 4 from Peters & Gaines 2012 The shift from widespread continental denudation to widespread sedimentation on the continents defines the Great Unconformity.

The sea level rise and increased weathering led to a buildup of ions and salts in the ocean, including calcium, that led to the formation of shells. Such hard protective coatings could then have driven an evolutionary arms race and rapid increases in body size, that are preserved in the fossil record as the Cambrian Explosion.

These two papers, both with Shanan Peters at the U. of Wisconsin, and well-explained by Scott Johnson, suggest that the fossil record and the geological record both record changes in global environment that shaped the evolutionary history of life. By reading the two together, we can have greater confidence that we’re not missing large chunks of this historical record.

http://arstechnica.com/science/news/2012/04/missing-rocks-may-explain-why-life-started-playing-shell-games/

http://arstechnica.com/science/news/2012/02/fossil-hips-dont-lie-rock-record-got-a-bad-rap.ars

Crampton, J. 2011 What drives biodiversity changes? Science 334:1073-1074 DOI: 10.1126/science.1214829

Hannisdal, B and SE Peters 2011 Phanerozoic Earth system evolution and marine biodiversity Science 334: 1121-1124 DOI: 10.1126/science.1210695

Peters SE and RR Gaines 2012 Formation of the “Great Unconformity” as a trigger for the Cambrian explosion Nature 484: 363–366 doi:10.1038/nature10969

Posted in Teaching and learning biology | Leave a comment

An idea for an untextbook for intro biology

Having reviewed significant parts of both standard textbooks and the recent on-line texts by Nature and OpenStax, I’m convinced that we need a radical departure. I call it the “untextbook”. I even have a title: An Evolutionary Framework for Biology.

This title contains double meanings. It’s “evolutionary” because the sequence of topics would start with evolution and discuss all subsequent topics from an evolutionary perspective:

1) historical development of scientific inquiry

2) definition and origin of life

3) earth history and evolutionary processes

4) molecules and membranes

5) cells and energy – from prokaryotes to eukaryotes

6) cellular reproduction, genetics and gene expression

7) evolution & diversification of major groups of organisms

8) interaction of cells & organisms with their environment

9) ecosystems, biomes, global change

It’s also “evolutionary” in the sense that, as a truly open source material, it will continually evolve through contributions from both instructors and students, and many forked versions will be modified and adapted to serve local needs.

It will be a “framework”, first in the sense that all intro textbooks are a framework that broadly surveys the various subdisciplines in the field. It’s also a “framework” in that the actual text will be skeletal; much of the content will consist of links to videos, animations, interactives, news articles, and blogs, with a sparse narrative to weave and introduce the topics. If someone has written and made public an explanation of a topic that is better than anything I could write, why not have students read that? I envision a collection of suggested links contributed by instructors and students, with curation, ideally by peer rating and comments.

And here’s the thing: it’s an untextbook because it will not be worth printing. Much or most of the value will be in the linked sources. It will best be viewed on a computer or tablet with an internet connection. It will be user-editable so the user can annotate his or her own notes, questions and comments (no more highlighting!).

I’d love to know what you all (instructors and students) think of this idea. I’m somewhat frustrated that these “new” biology texts from Nature and OpenStax stick to the paradigm laid out by Campbell & Reece. Campbell & Reece (& now others) is an excellent text for majors, as are Freeman et al. and Sadava et al. It’s just that the differences among them are splitting pedagogical hairs, and I’m getting impatient with the whole concept of a traditional textbook for teaching and learning. I see so much great material for students published on the web every year, why not take advantage?

 

Posted in Teaching and learning biology | 4 Comments

Working with 23andMe exome data: my CF allele and the need for verification

In my previous post, I described the summary report that accompanied my 23andMe exome sequence data, with summary statistics and the filtering scheme used to arrive at a list of 21 rare, moderate-to-high predicted impact variants. Here I will show how I used publicly available, free and open-source tools to explore my exome sequence further. I started with the 23andMe list of 21 variants of interest. My questions were:

  • Can I visualize these variants in the mapped sequence reads? See trust but verify below.
  • What are the functions of the affected genes?
  • How likely are the variants to seriously affect the functions of their genes?

Answers to these questions would allow me a better sense of how interesting or worrisome these variants may be, not just for my health, but for my daughter, who has a 50% chance of inheriting any one of these variants.

Bam files and samtools

The bam file is a compressed, binary version of the SAM (sequence alignment/map) file format. This file contains all of the sequence reads aligned to the reference genome sequence. These sequence alignments are used to determine whether your DNA has a sequence difference from the reference, whether you are heterozygous or homozygous for the variant, and how statistically reliable the genotype determination is, based on the number of times that particular base has been read (depth of coverage) and the quality of the sequence reads at that position. This alignment information (without the actual sequences) is captured in the index file, with a .bai filename extension.

If you are comfortable working with the command line, samtools is a software package that allows you to work with the bam file, to extract alignments for individual chromosomes, to convert bam to sam and vice versa, to create .bai index files, and to export the variant calls in the .vcf variant call format.

One way to actually see the reads aligned against the reference is to use the Integrative Genomics Viewer (IGV) from the Broad Institute. You select the reference genome used to create the alignments in the bam file, and load your bam file. The genome viewer uses the information in the .bai index file to show all the sequence reads “piled up” below the reference sequence. You can load multiple bam files (each must have a corresponding .bai index file) for comparison.

Example: my CF allele

I was somewhat surprised that a CFTR variant showed up on my 23andMe report. I use cystic fibrosis to integrate multiple concepts in my Intro Biology class (see my post on CF, a case study for membranes and transport). I knew that CF was most common among Northern Europeans, with much lower frequencies in Asians. But here it was, a mutation causing a non-conservative amino acid substitution. Changing glutamic acid (E) to glycine (G) could have serious consequences for either protein folding or activity, because these two amino acids are unalike.

23andMe report for my CFTR variant

Integrative Genome Viewer

To see the underlying data for myself, I loaded up my bam file in the IGV:

Snapshot of IGV window showing reads aligned to chr7:117175372. At this position, approximately half the reads in LF1396.bam have the reference ‘A’ and half have the alternate ‘G’. The track above shows LF1396.vcf.gz (the gzipped version of the vcf file).

My heterozygous genotype for this variant is obvious, and well-supported by high read depth and almost equal numbers of reads bearing the two alleles.

NCBI and dbSNP

This variant is identified as a previously characterized single nucleotide polymorphism (SNP) in dbSNP. I looked it up by going to the NCBI home page and simply entering “rs121909046″ in the search field and selecting the SNP database. The dbSNP page provides a wealth of information about this particular variant, identifying it as a probable pathogenic variant.

dbSNP page for rs121909046

Before I got too excited about the pathogenic part, I noted that this SNP is annotated as Glu217Gly, or E217G, affecting the 217th amino acid in the CFTR protein, whereas the 23andMe report annotates this as E187G, affecting the 187th amino acid. A quick check of the human CFTR protein sequence showed that the 187th amino acid is Asn (= N in single letter amino acid code). I don’t know why the 23andMe report has this discrepancy, but similar errors in identifying amino acid changes showed up in other gene variant reports (see Trust but verify, below).

OMIM

Scrolling down, near the bottom of the dbSNP page is a link to an OMIM (Online Mendelian Inheritance in Man) entry. Clicking on it took me to the subsection of the OMIM article on CFTR that cites a journal article concerning this variant. I followed links to the article by Lee et al. (2003), which is available free full-text. I learned that the E217G variant appears in the Korean population with an allele frequency around 1.3%, and that heterozygotes have a higher risk of bronchiectasis, but not of pancreatic insufficiency. Molecular studies revealed that this mutation causes 60% reduction in the amount of CFTR protein that appears on the membrane. It is a relatively mild disease allele.

UCSC Genome Browser

What more can I learn about this mutation? I can view it in the UCSC genome browser, which offers some additional information. I can even load my bam or vcf file as a custom track. I decided to load my vcf file to view just the variant annotations, rather than clutter up my view with all the read alignments. To do this, I had only to go to the directory containing my exome data file, which includes the gzipped vcf file, LF1396.vcf.gz, and run the following command: tabix -p vcf my.vcf.gz  to get a binary index file LF1396.vcf.gz.tbi. I followed the helpful directions at http://genome.ucsc.edu/goldenPath/help/vcf.html to load this as a custom track.

UCSC Genome Browser view of CFTR E217 gene location with JC exome added as a custom track

What I like about this view are the alignments with other vertebrates. The E217 in human CFTR is conserved throughout vertebrates, from fish to primates, with either glutamic acid (E) or aspartic acid (D) at this position. Aspartic acid and glutamic acid are chemically similar, both having acidic side chains, and are often interchangeable. Given such conservation at this position, having a glycine at this position most likely would affect the protein adversely.

Trust but verify

I used these tools to examine all 21 of my “interesting” variants identified in my 23andMe exome summary report. Of most concern to me were variants that the report stated would cause non-conservative amino acid changes in my MSH2 and PRNP proteins.

MSH2 is a DNA repair gene. Even heterozygotes have a markedly higher risk of cancer, particularly colon cancer, because of the real possibility that a somatic mutation will knock out the only functional copy of this gene. Such cells, unable to repair DNA damage, would rapidly accumulate other mutations that could lead to cancer.

PRNP is the prion protein. Altered folding of this brain protein leads to an infectious protein particle that causes slow neurodegenerative disease, as in mad cow disease and scrapie. Mutations in this protein are associated with familial (inherited) neurodegenrative diseases.

Both my MSH2 and PRNP mutations are already in dbSNP. I learned that my PRNP variant is known to be non-pathogenic. A huge relief! But whereas the 23andMe report identified the amino acid change as E158K (glutamic acid at position 158 changed to lysine), both dbSNP and IGV show that it is actually E219K. So the amino acid change was correctly identified, but at the wrong position, just like my CFTR mutation. More puzzling, and of greater concern, was that my MSH2 mutation was identified as A43D. But dbSNP and the BAM alignment viewed in IGV both show that this is a “silent” mutation that results in no amino acid change, involving glycine at either the 157th position or the 91st position, depending on the form of the protein (MSH2 has multiple forms resulting from alternative splicing).

I found a similar discrepancy in at least one other gene, where a silent mutation was incorrectly identified as an amino acid change. Overall, the 23andMe annotations of amino acid changes were a very mixed bag, where about half of the changes were incorrectly identified by position, or more seriously, by the type (amino acid change versus silent). This indicates a need to re-run that part of the analysis (SNPeff).

Learning my genetic heritage

Leaving aside these discrepancies on the effects of mutations, I found that one of the accurately annotated variants, in my GALK1 (galactokinase) gene, has a name, the “Osaka variant”, and occurs with a frequency of 4% among Japanese and 3% among Koreans (Okano et al., 2001). The Osaka variant is associated with bilateral cataract formation in elderly Japanese people. My mother had to undergo Lasik eye surgery for cataracts in both eyes; I strongly suspect that I inherited this allele from her.

Galactokinase deficiency is a finding of a kind that gladdens the hearts of personal genomics practitioners, because it is an “actionable” variant, meaning that a person can do something about it. In this case, the action is as simple as avoiding foods that contain high levels of galactose: milk and beets. I’m not fond of beets, and I drink milk only occasionally, but I would have a tough time giving up cheese and ice cream entirely.

In this exercise, I used several free, publicly available tools and databases to view and obtain a wealth of information about specific candidate variants. I learned that I carry a couple of unsuspected pathogenic alleles, for cystic fibrosis and galactokinase deficiency. Both alleles occur at relatively high frequencies in the Korean population. I don’t expect that the CF allele will affect my life, but I will be avoiding milk and moderating my ice cream consumption, in case my heterozygosity for the Osaka allele elevates my future risk for cataracts.

This is just the beginning of my exploration of my exome. Future blog posts will discuss various alternative ways to sift for “interesting” variants.

References:

Lee, J. H., Choi, J. H., Namkung, W., Hanrahan, J. W., Chang, J., Song, S. Y., Park, S. W., Kim, D. S., Yoon, J.-H., Suh, Y., Jang, I.-J., Nam, J. H., Kim, S. J., Cho, M.-O., Lee, J.-E., Kim, K. H., Lee, M. G. A haplotype-based molecular analysis of CFTR mutations associated with respiratory and pancreatic diseases. Hum. Molec. Genet. 12: 2321-2332, 2003. [PubMed: 12952861] [Full Text: HighWire Press, Pubget]

Okano, Y., Asada, M., Fujimoto, A., Ohtake, A., Murayama, K., Hsiao, K.-J., Choeh, K., Yang, Y., Cao, Q., Reichardt, J. K. V., Niihira, S., Imamura, T., Yamano, T. A genetic factor for age-related cataract: identification and characterization of a novel galactokinase variant, ‘Osaka,’ in Asians. Am. J. Hum. Genet. 68: 1036-1042, 2001. [PubMed: 11231902, related citations] [Full Text: Elsevier Science, Pubget]

Posted in human genetics | 2 Comments

A first look at my exome variants from 23andMe

About 5 months after I sent my saliva to 23andMe, I received an email that my exome results were ready. The data were in a large (4.2 GB) encrypted file folder, that could be opened only after I had downloaded and installed TrueCrypt. Eventually I was able to download and unpackage my data. These data consist of 4 files, all labeled with my identifier: “LF1396″ and ending with a .bam, .bai, .vcf.gz, and .report.pdf. The .bam file contains the alignments of the Illumina reads to the human reference sequence, the hg19 release. The .bai file is an index file of the read alignments. The .vcf.gz is a zipped .vcf  file, for “variant call format” developed by the 1000 Genomes Project, in the latest version 4.1. And the pdf report is a 17-page summary explaining the file formats, my “exome at a glance” summary statistics, and a description of the filtering scheme used to select 21 variants of interest. The rest of the report describes each of the 21 variants, sequentially filtered for high or moderate predicted effect, occurring at low frequency (<1%), in genes involved in Mendelian disorders.

Figre 1. Bases sequenced and exome coverage. A: number of bases sequenced; top line indicates total coverage of 117X. B: Number of called bases in exome. Small red sliver indicates variants from reference genome (hg19).

Figure 1A shows that a little under 4 billion bases align to or near the targeted exons. These on-target and near-target bases map to about 120 million exonic positions. The vast majority of the exonic base calls are identical to the human reference genome.

Figure 1C: Variant calls listed in the vcf file.

About 0.1% of the exonic base calls are variant compared to the reference sequence. Figure 1C shows that these variants consist of about 100,000 single-nucleotide polymorphisms (SNPs) and 10,000 insertions/deletions (indels). These numbers are consistent with unrelated humans sharing 99.9% DNA sequence identity.

Given over 100,000 total variants, which should I look at first? Which of these are most likely to influence my health or appearance or behavior? Which of these have the most impact on me being me? Although 23andMe specifically stated that no consumer-level interpretation would be provided as part of their pilot exome sequencing project, they do provide annotation of the variant calls, in the vcf file.

Figure 2. Classification of variants by predicted impact on gene function.

Figure 2 from the 23andMe exome report shows the distribution of my approximately 110,000 variants categorized according to their predicted impact on gene function.

  • High impact variants include gain of premature stop codons (nonsense mutations), frameshifts, splice site alterations, and loss of stop codons. My exome sequence contains 634 of these.
  • Moderate impact variants include non-synonymous substitutions (amino acid changes) and codon insertions and deletions (addition or deletion of amino acid residues). My exome sequence contains 11,504 of these.
  • Low impact variants include synonymous substitutions (no change in amino acid sequence) or gain of a start codon.
  • Unknown impact variants are those “unlikely to affect gene products” – presumably because they occur in non-exonic (intergenic or intronic) sequences.

Another way to look at these variants is by frequency in the human population. Variants that occur at high frequency are less likely to have serious adverse consequences. Conversely, it’s tempting to think that rare and unique variants may contribute to me being such a unique and rare individual.

Figure 3. Variant frequencies.

Figure 3 shows that about 15% of my exome variants are rare (occur at <1% frequency) or previously unidentified (unique). As more exomes and whole genomes are sequenced, the proportion of “unique” variants will diminish, but the 15% proportion of rare variants is unlikely to shift significantly. After all, you have to figure that most of the common variants have already been identified.

These classifications can be combined to filter the variants, first by predicted effect, then by frequency, to identify those variants with high or moderate predicted impact, that are rare. Then 23andMe asked whether any of these filtered variants occur among a list of 592 genes “involved in Mendelian disorders” (Figure 4).

Figure 4. Variant filtering process.

This filtering scheme resulted in a list of 21 variants. All 21 on my report were predicted to have “moderate” impact, and all were non-synonymous substitutions. But even a cursory look through these 21 amino acid changes reveals that some are more likely to affect protein structure or function than others. For example, some are conservative amino acid changes, where the variant amino acid has similar physico-chemical properties as the original amino acid. Examples are L25V (leucine at amino acid position 25 changed to valine; both have hydrophobic side chains) and I929V (again, isoleucine and valine are both hydrophobic). Other changes are more potentially disruptive, where the variant amino acid has very different properties from the original. Examples are R1125W, with arginine (a positively charged side chain) replaced by tryptophan (large hydrophobic side chain); E158K and E482K, which substitute positively charged lysine for a negatively charged glutamic acid; and R150C, which puts cysteine in place of arginine.

The report does not say whether I am homozygous or heterozygous for any of these 21 variants. I presume that I am heterozygous for all of them (23andMe excluded X and Y chromosome genes). I can check these myself by looking them up in the vcf file (that will be a later post).

This post then gives curious readers what they can expect at this point if they have their exome sequenced by 23andMe. Clearly, this barely scrapes the surface of one tiny corner of the exome sequence data. In my next post, I will present some open-source tools for looking at and sifting through the data yourself. In the meantime, I am making my vcf file publicly available here: http://dl.dropbox.com/u/69564734/LF1396.vcf.gz

Posted in human genetics | 23 Comments

Did Life Begin with “RNA on Steroids”?

The “RNA world” hypothesis posits that life, and biological evolution, began with self-replicating RNA molecules. Before DNA, before protein enzymes, RNA molecules both stored hereditary information, and performed the catalytic functions required for replication. All cells today still depend on RNA catalysis for some core functions such as protein synthesis by the ribosomes, where the ribosomal RNA molecule forms the peptidyl transferase catalytic center, rather than any of the ribosomal proteins. Other enzymatic RNAs (ribozymes) catalyze self-splicing, RNA cleavage, RNA ligation, and RNA polymerase activities.

All reactions involving ribozymes require divalent cations, preferably Mg2+. Recent and ongoing research by the laboratories of Loren Williams, Nick Hud, Roger Wartell, and Stephen Harvey at Georgia Tech asked whether Fe2+ could substitute for Mg2+ in RNA folding and catalysis (Athavale et al. 2012). Fe2+ is a soluble form of iron that was abundant in the Earth’s early oceans until cyanobacterial ancestors began to produce large quantities of molecular oxygen (O2), around 2.7 billion years go. In the presence of O2, Fe2+ is oxidized to Fe3+, which is insoluble and precipitates in the form of iron oxide (rust). This massive precipitation of iron oxide onto ocean sediments formed banded iron formations that can be found around the world.

Athavale et al. discovered that Fe2+, with coordination geometry similar to Mg2+, could indeed substitute for Mg2+ in folding and catalysis by ribozymes. Moreover, Fe2+ enhanced ligation by the Tetrahymena Group I intron by 25-fold and cleavage by a hammerhead ribozyme by 3-fold compared to equivalent concentrations of Mg2+.

Athavale et al. 2012 Fig. 3 Ribozyme activity is enhanced by Fe2+ compared to Mg2+.
A) L1 ribozyme ligase activity is enhanced in Fe2+ compared to Mg2+. Reaction progress was monitored by gel electrophoresis. B) Hammerhead ribozyme activity is enhanced in Fe2+ compared to Mg2+. Reactions were monitored by both gel electrophoresis and capillary electrophoresis, which gave similar results.

These results help address one of the criticisms of the RNA world hypothesis, that RNA catalysis is slow and inefficient. In the anoxic oceans of the Archaean eon (from the origin of life to circa 2.5 billion years ago), dissolved Fe2+ would have greatly enhanced the catalytic activities of RNA molecules. In the words of Loren Williams, RNA in the presence of Fe2+ was like “RNA on steroids” (H Thompson 2012).

References:

Athavale SS, AS Petrov, C Hsiao, D Watkins, CD Prickett, JJ Gossett, L Lie, JC Bowman, E O’Neill, CR Bernier, NV Hud, RM Wartell, SC Harvey, LD Williams, 2012. RNA folding and catalysis mediated by Iron (II), PLoS ONE 7(5): e38024. doi:10.1371/journal.pone.0038024

H Thompson 2012, Dissolved iron may have been key to RNA-based life, Nature News Blog http://blogs.nature.com/news/2012/06/dissolved-iron-may-have-been-key-to-rna-based-life.html

Posted in Teaching and learning biology | Leave a comment