SYSTEMATIC SEQUENCING OF COMPLEX GENOMES

Biology and medicine are in the midst of a revolution, the full extent of which will probably not be realized for many years to come. The catalyst for this revolution is the Human Genome Project1 and related activities that aim to develop improved technologies for analysing DNA, to generate detailed information about the genomes of numerous organisms, and to establish powerful experimental and computational approaches for studying genome structure and function. The past few years have seen a remarkable crescendo in accomplishments related to DNA sequencing, with genome sequences being generated for several key experimental organisms, including a yeast (Saccharomyces cerevisiae), a nematode (Caenorhabditis elegans), a fly (Drosophila melanogaster), a plant (Arabidopsis thaliana) and the human (Homo sapiens). Collectively, the generation of these sequence data and others is launching the ‘sequence-based era’of biomedical research. Associated with the above accomplishments has been the refinement of existing strategies for genome sequencing, as well as the development of new ones. Among these are approaches that make extensive use of large-insert clones and associated physical maps, some that take a whole-genome approach without using clone-based physical maps, and others that use a hybrid strategy that involves elements of the other two. Each of these general strategies for genome sequencing is described in this review. There are many potential uses of genome-sequence data. In some cases, a detailed and accurate sequencebased ‘blueprint’ of a genome is required (for example, to establish a comprehensive gene catalogue and/or to gain insight into long-range genome organization), whereas in other cases, an incomplete survey will suffice (for example, to acquire information about the repetitive sequences in a genome and/or to carry out simple, non-comprehensive comparisons to sequences from other organisms). Importantly, the intended use(s) of genome-sequence data must be carefully considered when choosing a specific sequencing strategy and defining the end point of a particular project. These issues, as well as the plans for future sequencing initiatives by the Human Genome Project, are also discussed.

[1]  G. Church,et al.  Multiplex DNA sequencing. , 1988, Science.

[2]  M. Adams,et al.  Shotgun Sequencing of the Human Genome , 1998, Science.

[3]  J. M. Prober,et al.  A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. , 1987, Science.

[4]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[5]  D. Bentley,et al.  Genome mapping by fluorescent fingerprinting. , 1997, Genome research.

[6]  C. Scriver,et al.  The Metabolic and Molecular Bases of Inherited Disease, 8th Edition 2001 , 2001, Journal of Inherited Metabolic Disease.

[7]  J. Bonfield,et al.  A new DNA sequence assembly program. , 1995, Nucleic acids research.

[8]  A. Coulson,et al.  Genome linking with yeast artificial chromosomes , 1988, Nature.

[9]  G. Mahairas,et al.  Sequence-tagged connectors: a sequence approach to mapping and scanning the human genome. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Sebastian Kloska,et al.  A complete BAC-based physical map of the Arabidopsis thaliana genome , 1999, Nature Genetics.

[11]  D. Bentley Decoding the human genome sequence. , 2000, Human molecular genetics.

[12]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[13]  L. Hood,et al.  A common language for physical mapping of the human genome. , 1989, Science.

[14]  G M Rubin,et al.  A BAC-based physical map of the major autosomes of Drosophila melanogaster. , 2000, Science.

[15]  P. Green,et al.  A "quality-first" credo for the Human Genome Project. , 1998, Genome research.

[16]  C. Heiner,et al.  New dye-labeled terminators for improved DNA sequencing patterns. , 1997, Nucleic acids research.

[17]  André Goffeau,et al.  The yeast genome directory. , 1997, Nature.

[18]  F. Sanger,et al.  Nucleotide sequence of bacteriophage lambda DNA. , 1982, Journal of molecular biology.

[19]  J. Craig Venter,et al.  A new strategy for genome sequencing , 1996, Nature.

[20]  Marco Marra,et al.  A map for sequence analysis of the Arabidopsis thaliana genome , 1999, Nature Genetics.

[21]  Jonathan A. Eisen,et al.  Microbial genome sequencing , 2000, Nature.

[22]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[23]  K. Isono,et al.  The physical map of the whole E. coli chromosome: Application of a new strategy for rapid analysis and sorting of a large genomic library , 1987, Cell.

[24]  M. Metzker,et al.  Electrophoretically Uniform Fluorescent Dyes for Automated DNA Sequencing , 1996, Science.

[25]  D. Meldrum,et al.  Automation for genomics, part one: preparation for sequencing. , 2000, Genome research.

[26]  L. Hood,et al.  Large-scale and automated DNA sequence determination. , 1991, Science.

[27]  R. Mazzarella,et al.  X chromosome map at 75-kb STS resolution, revealing extremes of recombination and GC content. , 1997, Genome research.

[28]  Elaine R. Mardis,et al.  In Genome analysis: A laboratory manual , 1997 .

[29]  P. Rigault,et al.  A YAC contig map of the human genome. , 1995, Nature.

[30]  R A Mathies,et al.  Fluorescence energy transfer dye-labeled primers for DNA sequencing and analysis. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[31]  M. Adams,et al.  Automated DNA sequencing and analysis. , 1994 .

[32]  Richard A. Gibbs,et al.  Shotgun sample sequence comparisons between mouse and human genomes , 2000, Nature Genetics.

[33]  J. Messing The universal primers and the shotgun DNA sequencing method. , 2001, Methods in molecular biology.

[34]  J. Messing,et al.  The complete nucleotide sequence of an infectious clone of cauliflower mosaic virus by M13mp7 shotgun sequencing. , 1981, Nucleic acids research.

[35]  C. Richardson,et al.  A single residue in DNA polymerases of the Escherichia coli DNA polymerase I family is critical for distinguishing between deoxy- and dideoxyribonucleotides. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[36]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[37]  Melanie E. Goward,et al.  The DNA sequence of human chromosome 22 , 1999, Nature.

[38]  H. Erfle,et al.  Automated DNA sequencing of the human HPRT locus. , 1990, Genomics.

[39]  B. Burr,et al.  International Rice Genome Sequencing Project: the effort to completely sequence the rice genome. , 2000, Current opinion in plant biology.

[40]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[41]  C. Soderlund,et al.  Contigs built with fingerprints, markers, and FPC V4.7. , 2000, Genome research.

[42]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[43]  B. Birren,et al.  Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[44]  W R McCombie,et al.  Kaleidaseq: a Web-based tool to monitor data flow in a high throughput sequencing facility. , 1998, Genome research.

[45]  J. Weber,et al.  Human whole-genome shotgun sequencing. , 1997, Genome research.

[46]  L. Hillier,et al.  Automated sequence preprocessing in a large-scale sequencing environment. , 1998, Genome research.

[47]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[48]  The Sanger Centre Toward a complete human genome sequence. , 1998, Genome research.

[49]  K Falls,et al.  Multiplex sequencing of 1.5 Mb of the Mycobacterium leprae genome. , 1997, Genome research.

[50]  R. Wilson,et al.  High throughput fingerprint analysis of large-insert clones. , 1997, Genome research.

[51]  L Kruglyak,et al.  An STS-Based Map of the Human Genome , 1995, Science.

[52]  R. Weiss,et al.  Enzyme-linked fluorescent detection for automated multiplex DNA sequencing. , 1994, Genomics.

[53]  M. Olson,et al.  Physical maps of the six smallest chromosomes of Saccharomyces cerevisiae at a resolution of 2.6 kilobase pairs. , 1993, Genetics.

[54]  C B Lawrence,et al.  The genome reconstruction manager: a software environment for supporting high-throughput DNA sequencing. , 1994, Genomics.

[55]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[56]  J. McPherson Sequence ready-or not? , 1997, Genome research.

[57]  M. King,et al.  A Primate Genome Project Deserves High Priority , 2000, Science.

[58]  R. Fleischmann,et al.  Strategies for whole microbial genome sequencing and analysis , 1997, Electrophoresis.

[59]  Genome Analysis: A Laboratory Manual (Vols. 1–4) , 1999 .

[60]  D. Meldrum,et al.  Automation for genomics, part two: sequencers, microarrays, and future trends. , 2000, Genome research.

[61]  B. Trask,et al.  A High-Resolution Radiation Hybrid Map of the Human Genome Draft Sequence , 2001, Science.

[62]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[63]  The International HapMap Consortium,et al.  A physical map of the human genome , 2001 .

[64]  C. Amemiya,et al.  A new bacteriophage P1–derived vector for the propagation of large human DNA fragments , 1994, Nature Genetics.

[65]  Steve D. M. Brown,et al.  A YAC-based physical map of the mouse genome , 1999, Nature Genetics.

[66]  A. Varki,et al.  A chimpanzee genome project is a biomedical imperative. , 2000, Genome research.

[67]  W Miller,et al.  Analysis of the quality and utility of random shotgun sequencing at low redundancies. , 1998, Genome research.

[68]  M. Olson,et al.  Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. , 1987, Science.

[69]  C. Desmarais,et al.  Automated finishing with autofinish. , 2001, Genome research.

[70]  P. Lijnzaad,et al.  A physical map of 30,000 human genes. , 1998, Science.

[71]  Representation of cloned genomic sequences in two sequencing vectors: correlation of DNA sequence and subclone distribution. , 1997, Nucleic acids research.

[72]  P. Deininger Random subcloning of sonicated DNA: application to shotgun DNA sequence analysis. , 1983, Analytical biochemistry.

[73]  J. VandeBerg,et al.  Examining Priorities for a Primate Genome Project , 2000, Science.

[74]  J. Mullikin,et al.  Sequencing the Genome, Fast , 1999, Science.

[75]  Coordination of human genome sequencing via a consensus framework map. , 1998, Trends in genetics : TIG.

[76]  Andrew Smith Genome sequence of the nematode C-elegans: A platform for investigating biology , 1998 .

[77]  P. Green,et al.  Against a whole-genome shotgun. , 1997, Genome research.

[78]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[79]  R. Fulton,et al.  A physical map of human chromosome 7: an integrated YAC contig map with average STS spacing of 79 kb. , 1997, Genome research.

[80]  P. Deloukas,et al.  Comparison of human genetic and sequence-based physical maps , 2001, Nature.

[81]  D. Haussler,et al.  Integration of cytogenetic landmarks into the draft sequence of the human genome , 2001, Nature.

[82]  L. Hillier,et al.  Theories and applications for sequencing randomly selected clones. , 2001, Genome research.

[83]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[84]  G. M. Huang,et al.  High-throughput DNA sequencing: a genomic data manufacturing process. , 1999, DNA sequence : the journal of DNA sequencing and mapping.

[85]  A. Coulson,et al.  Toward a physical map of the genome of the nematode Caenorhabditis elegans. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[86]  Lloyd M. Smith,et al.  Fluorescence detection in automated DNA sequence analysis , 1986, Nature.

[87]  S. Brenner,et al.  Characterization of the pufferfish (Fugu) genome as a compact model vertebrate genome , 1993, Nature.

[88]  M. Guyer,et al.  Assessing the quality of the DNA sequence from the Human Genome Project. , 1999, Genome research.

[89]  M. Olson,et al.  Chromosomal region of the cystic fibrosis gene in yeast artificial chromosomes: a model for human genome mapping. , 1990, Science.

[90]  Carol Soderlund,et al.  FPC: a system for building contigs from restriction fingerprinted clones , 1997, Comput. Appl. Biosci..

[91]  C. Heiner,et al.  New energy transfer dyes for DNA sequencing. , 1997, Nucleic acids research.

[92]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[93]  C. Fizames,et al.  Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence , 2000, Nature Genetics.

[94]  R Waterston,et al.  The Human Genome Project: Reaching the Finish Line , 1998, Science.

[95]  D. Meldrum Sequencing Genomes and Beyond , 2001, Science.

[96]  S. Anderson,et al.  Shotgun DNA sequencing using cloned DNase I-generated fragments , 1981, Nucleic Acids Res..