Strategies for whole microbial genome sequencing and analysis

The introduction of methods for automated DNA sequence analysis nearly a decade ago, together with more recent advances in the field of bioinformatics, have revolutionized biology and medicine and have ushered in a new era of genomic science, the study of genes and genomes. These new technologies have had an impact on many areas of research, including the association between genes and disease, in DNA‐based diagnostics, and in the sequencing of genomes from human and other model organisms. The demonstration in 1995, that automated DNA sequencing methods could be used to decipher the entire genome sequence of a free‐living organism, Haemophilus influenzae, was a milestone in both the genomics and microbial fields [1]. Since the first report of the complete sequence of H. influenzae, these methodologies have been adopted by laboratories around the world. The complete genomic sequence of five eubacterial species [1–5], one archaea [6], and the eukaryote, Saccharomyces cerevisiae [7], have been reported in the last 18 months. At the beginning of 1997 more than a dozen microbial genome projects are at or near completion, with many others in progress. It is likely that in the next few years we will see the complete sequence of perhaps as many as 30–40 microbial genomes. In this article, we will review methods for whole genome sequencing and analysis and examine how this information can be exploited to better understand microbial physiology and evolution.

[1]  O. Hino,et al.  Isolation of Genes Differentially Expressed between the Yoshida Sarcoma and Long‐survival Yoshida Sarcoma Variants: Origin of Yoshida Sarcoma Revisited , 1994, Japanese journal of cancer research : Gann.

[2]  E. Lander,et al.  Genomic mapping by fingerprinting random clones: a mathematical analysis. , 1988, Genomics.

[3]  J. Craig Venter,et al.  Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library , 1993, Nature Genetics.

[4]  E. Smeland,et al.  Solid-phase method for differential display of genes expressed in hematopoietic stem cells. , 1996, BioTechniques.

[5]  Natalia Ivanova,et al.  The metabolic pathway collection: an update , 1997, Nucleic Acids Res..

[6]  Douglas L. Brutlag,et al.  BLAZETM: An Implementation of the Smith-Waterman Sequence Comparison Algorithm on a Massively Parallel Computer , 1993, Comput. Chem..

[7]  D. Botstein,et al.  Functional analysis reports. Precise gene disruption in Saccharomyces cerevisiae by double fusion polymerase chain reaction , 1995, Yeast.

[8]  K. Oda,et al.  Gene organization deduced from the complete sequence of liverwort Marchantia polymorpha mitochondrial DNA. A primitive form of plant mitochondrial genome. , 1992, Journal of molecular biology.

[9]  A. Gottlieb,et al.  Identification of aberrantly regulated genes in diseased skin using the cDNA differential display technique. , 1997, The Journal of investigative dermatology.

[10]  R. Fleischmann,et al.  Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. , 1995, Nature.

[11]  T. Ito,et al.  Fluorescent differential display analysis of gene expression in differentiating neuroblastoma cells. , 1997, Gene.

[12]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[13]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[14]  J. Craig Venter,et al.  3,400 new expressed sequence tags identify diversity of transcripts in human brain , 1993, Nature Genetics.

[15]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[16]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[17]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[18]  M. Wigler,et al.  Cloning the differences between two complex genomes , 1993, Science.

[19]  K. Ozaki,et al.  Isolation of three testis-specific genes (TSA303, TSA806, TSA903) by a differential mRNA display method. , 1996, Genomics.

[20]  K. Kinzler,et al.  Serial Analysis of Gene Expression , 1995, Science.

[21]  M. Kuwano,et al.  Increased expression of T‐plastin gene in cisplatin‐resistant human cancer cells: identification by mRNA differential display , 1996, FEBS letters.

[22]  Y. Nakamura,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions (supplement). , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[23]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[24]  H. Hilbert,et al.  Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. , 1996, Nucleic acids research.

[25]  M. Adams,et al.  Differential Gene Expression Profiles in G1 and S Phase Synchronized Jurkat T Cell Leukemia Cells: Investigation Using an Expressed Sequence Tag Analysis , 1996 .

[26]  J. Craig Venter,et al.  Sequence identification of 2,375 human brain genes , 1992, Nature.

[27]  F. Sanger,et al.  Nucleotide sequence of bacteriophage lambda DNA. , 1982, Journal of molecular biology.

[28]  S. Goebel,et al.  The complete DNA sequence of vaccinia virus. , 1990, Virology.

[29]  A. Kerlavage,et al.  Potential virulence determinants in terminal regions of variola smallpox virus genome , 1993, Nature.

[30]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[31]  M. Jones,et al.  The identification of novel gene sequences of the human adult testis. , 1994, Genomics.

[32]  S. Tsui,et al.  A catalogue of genes in the cardiovascular system as identified by expressed sequence tags. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[33]  C. Hutchison,et al.  The DNA sequence of the human cytomegalovirus genome. , 1991, DNA sequence : the journal of DNA sequencing and mapping.

[34]  C. Denny,et al.  EAT-2 is a novel SH2 domain containing protein that is up regulated by Ewing's sarcoma EWS/FLI1 fusion gene. , 1996, Oncogene.

[35]  M. Riley,et al.  Functions of the gene products of Escherichia coli , 1993, Microbiological reviews.

[36]  F. Blattner,et al.  Global regulation of gene expression in Escherichia coli , 1993, Journal of bacteriology.

[37]  F. Quinn,et al.  In search of virulence factors of human bacterial disease. , 1997, Trends in microbiology.

[38]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[39]  R. Staden,et al.  The C. elegans genome sequencing project: a beginning , 1992, Nature.

[40]  G. Bell,et al.  A molecular inventory of human pancreatic islets: sequence analysis of 1000 cDNA clones. , 1993, Human molecular genetics.

[41]  M Hubank,et al.  Identifying differences in mRNA expression by representational difference analysis of cDNA. , 1994, Nucleic acids research.

[42]  A. Pardee,et al.  Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. , 1992, Science.

[43]  Kousaku Okubo,et al.  Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression , 1992, Nature Genetics.

[44]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[45]  Ronald W. Davis,et al.  Quantitative phenotypic analysis of yeast deletion mutants using a highly parallel molecular bar–coding strategy , 1996, Nature Genetics.

[46]  K. Wong,et al.  Stress-inducible gene of Salmonella typhimurium identified by arbitrarily primed PCR of RNA. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Wei Zhou,et al.  Characterization of the Yeast Transcriptome , 1997, Cell.

[48]  M. Adams,et al.  Comparative expressed-sequence-tag analysis of differential gene expression profiles in PC-12 cells before and after nerve growth factor treatment. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[49]  M. Dron,et al.  Visualization of viral candidate cDNAs in infectious brain fractions from Creutzfeldt-Jakob disease by representational difference analysis. , 1996, Journal of neurovirology.