ForumMassively Parallel Undergraduates for Bacterial Genome Finishing
Mobilizing students for genome research can yield useful results and a valuable educational experience
Brad W. Goodner, Cathy A. Wheeler, Prudy J. Hall, and Steven C. Slater
Brad W. Goodner is Associate Professor, Cathy A. Wheeler is Laboratory Teaching Associate, and Prudy J. Hall is Professor in the Department of Biology, Hiram College, Hiram, Ohio, and Steven C. Slater is Senior Scientist at Monsanto Company, Middleton, Wis.
Think about an undergraduate laboratory section. Now imagine that every result from every team at every bench produces not a predetermined right answer, but a bit of novel data providing new insight into an organism. Imagine those small discoveries multiplied hundreds of time across a large number of institutions, and a means to assemble the data into a coherent whole. Finally, consider the excitement and motivation that comes from learning while making real contributions to the field of biology.
Over the past several years, in the context of our ongoing genomics collaboration, we have been developing undergraduate biochemistry, genetics, microbiology, and molecular biology courses in which every experiment is designed to help complete a bacterial genome project. In our experience, genomics and bioinformatics have the inherent opportunities and scientific impact needed to successfully meld research with undergraduate education. We see an opportunity to create Genomics Cooperatives that reshape a portion of undergraduate science education while completing many of these genome projects.
There is an urgent need to strengthen biology education through more interdisciplinary, hands-on learning, and a focus on experimental thinking and problem solving. Here we outline strategies that lay the framework for productive genomics programs at undergraduate institutions. Shotgun sequence is provided by a large genome center, and the time-consuming tasks of genome closure and annotation are accomplished primarily by undergraduates during their coursework. In collaboration with finishers at a genome center, a large number of parallel efforts would permit completion of many bacterial genome projects. We used this strategy successfully to complete the Agrobacterium tumefaciens C58 genome sequence, and we are currently applying it to several additional genomes (B. Goodner et al., Science 294:2323-2328).
Genome and subgenome libraries. A library of random genomic fragments, sufficient sequencing to ensure fivefold or more genome coverage, and computer pattern-matching software are all that are needed to initiate a genome project. We would envision the initial sequence coming from one of the larger genome centers, with our proposed cooperatives acquiring the raw data. For organisms with complex genomes (multiple chromosomes and/or plasmids), the shotgun strategy might be supplemented with a set of subgenome libraries made using DNA molecules isolated by pulsed-field gel electrophoresis. Construction of such libraries is within the ability of upper-level students in a typical laboratory course.
Genome mapping. While the shotgun strategy relies on elevated sequencing coverage before gap closure can begin, a combined physical/genetic map can provide much information needed for contig orientation and assembly, while requiring lower overall sequence coverage of the genome. This approach can begin before any DNA sequence is available, and makes sense in an undergraduate setting because techniques like transposon mutagenesis, characterization of mutants, cloning of transposon insertions, and pulsed-field gel electrophoresis fit well into undergraduate genetics and microbiology courses and independent research projects. Transposons are available that facilitate the cloning of transposon insertion sites. Flanking sequence for a set of scattered mutations (as little as 100 bp) combined with the mutations' physical locations on a restriction map of the genome allows for the positioning of sequence contigs in relation to each other. In addition, searching for transposon-induced mutants of particular phenotypes (e.g., auxotrophs) allows for subsequent experiments with mutants of interest and allows for a first pass genomic view of gene organization. Mutant characterization can also make a significant contribution to biological validation of gene predictions (see below).
Gap closure. A genome map can help align contigs for subsequent gap closure. Another method for aligning contigs makes use of basic sequence annotation that can be taught in any one of several biology courses. By analyzing 1,000-2,000 bp of sequence from each end of each available contig, students can look for similarity among partial ORFs or for adjacent genes from operons that identify putative adjacent contigs. Regardless of the method used, students then can learn to design PCR primers and use those primers to clone fragments that span adjacent contigs.
Sequence annotation. With introductory training on the practical use of sequence analysis programs available over the Web, students can annotate short stretches of novel sequence. The novelty is important even in this early training because it spurs ownership, curiosity, and creativity. Through this experience, students gain a better understanding of gene anatomy at the sequence level, what sequence similarity means both practically and evolutionarily, what sequence information can tell us about the biology of an organism, and how we can test it experimentally. Students can move on to learn how some of these programs actually work, allowing them to use the full power of the software on much larger sequence sets.
Making the impractical, practical: testing gene predictions. In an era of high-throughput annotation, we generally rely on sequence similarity and gene context to infer gene function. Integrating genomics into undergraduate courses deploys a small army of undergraduates for biological validation of many of these gene designations. Construction of site-directed mutations is possible in many culturable organisms, and these experiments fit nicely into microbiology, genetics, and molecular biology courses. The students have an opportunity to make decisions about gene function and then test them in the laboratory. Individual research projects can also be directed toward disruption of putative duplicate genes, or genes of unknown function. Since large contigs are available early in the project, this work can be initiated as soon as the first assembly is complete.
Comparative analyses and environmental sampling. While we have focused on various components of genomic sequencing projects, there are other opportunities for undergraduates in comparative studies of whole genomes and genome segments. Most of these studies will involve preexisting sequence information from multiple genomes, but sequence information from one representative species can be used by students to query many unsequenced members of a larger group. Moreover, rDNA gene characterization from environmental samples opened the door to new approaches for dealing with previously unknown (and most likely unculturable) organisms and mixed populations. The sky is the limit in terms of questions and sampling sites.
Massively parallel undergraduates for genome finishing. We envision a network of genomics cooperatives, each focusing on one or more genomes of interest and involving a combination of faculty and students from one or more undergraduate institutions collaborating with scientists at large genome centers and/or at corporations. A given project could start from scratch or build on preexisting genome drafts available from the Department of Energy Joint Genome Institute or one of the collaborating institutions. Students would be involved in the project initially through course-related research, and many would eventually choose an independent research project. Goals can be set appropriately for each setting and timeframe. The ultimate goal in each case would be a completed genome sequence, including annotation and any other genomic analyses desired by the cooperative. What the strategy lacks in speed it gains in more finished genomes and a strengthened education for undergraduates.
While it will take some effort to provide students with the appropriate training and support needed, the job is made easier because the students rapidly claim ownership of the opportunity to perform research in the context of coursework. The harder problem hiding in the shadows, as pointed out in the NRC report, is to provide faculty with the necessary background, training, and support. The task is feasible within the context of course preparation, but it will require commitments by the faculty involved, their home institutions, their collaborators at other institutions, and funding agencies. Our goal here is to encourage interested individuals from small institutions, major research institutions, corporations, and funding agencies to help us develop genomics cooperatives as a new strategy for melding genome finishing with undergraduate education and research.
We thank Hiram College and Monsanto Company for their continued support of our efforts to meld quality research with undergraduate education. We also thank Jeff Elhai for his own efforts in this area and for many helpful comments on this manuscript.