An Applications-focused Review of Comparative Genomics Tools: Capabilities, Limitations and Future Challenges

A team at the Lawrence Livermore National Laboratory (LLNL) was given the task of using computational tools to speed up the development of DNA diagnostics for pathogen detection. This work will be described in another paper in this issue (see pages 133-149). To achieve this goal it was necessary to understand the merits and limitations of the various available comparative genomics tools. A review of some recent tools for multisequence/genome alignment and substring comparison is presented, within the general framework of applicability to a large-scale application. We note that genome alignments are important for many things, only one of which is pathogen detection. Understanding gene function, gene regulation, gene networks, phylogenetic studies and other aspects of evolution all depend on accurate nucleic acid and protein sequence alignment. Selecting appropriate tools can make a large difference in the quality of results obtained and the effort required.

[1]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[2]  N. W. Davis,et al.  Genome sequence of enterohaemorrhagic Escherichia coli O157:H7 , 2001, Nature.

[3]  Erik L. L. Sonnhammer,et al.  A workbench for large-scale sequence homology analysis , 1994, Comput. Appl. Biosci..

[4]  Thomas A. Kuczmarski,et al.  Limitations of TaqMan PCR for Detecting Divergent Viral Pathogens Illustrated by Hepatitis A, B, C, and E Viruses and Human Immunodeficiency Virus , 2003, Journal of Clinical Microbiology.

[5]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[6]  S Schwartz,et al.  Web-based visualization tools for bacterial genome alignments. , 2000, Nucleic acids research.

[7]  Jeremy Buhler,et al.  Efficient large-scale sequence comparison by locality-sensitive hashing , 2001, Bioinform..

[8]  Jon Louis Bentley,et al.  K-d trees for semidynamic point sets , 1990, SCG '90.

[9]  Balaji Raghavachari,et al.  Chaining Multiple-Alignment Blocks , 1994, J. Comput. Biol..

[10]  D. Volokhov,et al.  Identification of Listeria Species by Microarray-Based Assay , 2002, Journal of Clinical Microbiology.

[11]  R. Schoenfeld,et al.  Comparative Genomics of Listeria Species , 1976 .

[12]  Webb Miller,et al.  Comparison of genomic DNA sequences: solved and unsolved problems , 2001, Bioinform..

[13]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[14]  Guy Perrière,et al.  LALNVIEW: a graphical viewer for pairwise sequence alignments , 1996, Comput. Appl. Biosci..

[15]  W. Miller,et al.  Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. , 1997, Genome research.

[16]  L. Gautier,et al.  Comparative Genomics of Listeria Species , 2001, Science.

[17]  Remo Guidieri Res , 1995, RES: Anthropology and Aesthetics.

[18]  M. Goodman,et al.  Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. , 1988, Journal of molecular biology.

[19]  W. J. Kent,et al.  Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. , 2000, Genome research.

[20]  R. Durbin,et al.  A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. , 1995, Gene.

[21]  Mihai Pop,et al.  Comparative Genome Sequencing for Discovery of Novel Polymorphisms in Bacillus anthracis , 2002, Science.

[22]  Kim Rutherford,et al.  Artemis: sequence visualization and annotation , 2000, Bioinform..

[23]  L. Pachter,et al.  Strategies and tools for whole-genome alignments. , 2002, Genome research.

[24]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[25]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[26]  R. Durbin,et al.  Alfresco--a workbench for comparative genomic sequence analysis. , 2000, Genome research.

[27]  Serge A. Hazout,et al.  A strategy for finding regions of similarity in complete genome sequences , 1998, Bioinform..

[28]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[29]  Benjamin L. King,et al.  Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori , 1999, Nature.

[30]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[31]  Burkhard Morgenstern,et al.  Exon discovery by genomic sequence alignment , 2002, Bioinform..

[32]  R. Durbin,et al.  Biological sequence analysis: Background on probability , 1998 .

[33]  J. P. Dumas,et al.  Efficient algorithms for folding and comparing nucleic acid sequences , 1982, Nucleic Acids Res..

[34]  Enno Ohlebusch,et al.  Efficient multiple genome alignment , 2002, ISMB.

[35]  Guy Plunkett,et al.  Genome Sequence of Yersinia pestis KIM , 2002, Journal of bacteriology.

[36]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[37]  Berthold Göttgens,et al.  Analysis of vertebrate SCL loci identifies conserved enhancers , 2000, Nature Biotechnology.

[38]  Inyoul Y. Lee,et al.  Complete genomic sequence and analysis of the prion protein gene region from three mammalian species. , 1998, Genome research.

[39]  M. Hattori,et al.  Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. , 2001, DNA research : an international journal for rapid publication of reports on genes and genomes.

[40]  Enno Ohlebusch,et al.  The Enhanced Suffix Array and Its Applications to Genome Analysis , 2002, WABI.

[41]  Nicholas L. Bray,et al.  AVID: A global alignment program. , 2003, Genome research.

[42]  A. Dress,et al.  Multiple DNA and protein sequence alignment based on segment-to-segment comparison. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[43]  D. Church,et al.  Cross-species sequence comparisons: a review of methods and available resources. , 2003, Genome research.

[44]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[45]  R. Durbin,et al.  Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. , 1999, Genome research.

[46]  Lior Pachter,et al.  VISTA : visualizing global DNA sequence alignments of arbitrary length , 2000, Bioinform..

[47]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[48]  J. Stoye,et al.  REPuter: the manifold applications of repeat analysis on a genomic scale. , 2001, Nucleic acids research.

[49]  I-Min A. Dubchak,et al.  Active conservation of noncoding sequences revealed by three-way species comparisons. , 2000, Genome research.

[50]  G. Pozzi,et al.  Comparative genomics for identification of clone-specific sequence blocks in Streptococcus pneumoniae. , 2001, FEMS microbiology letters.

[51]  B. Roe,et al.  Comparative sequence analysis of 634 kb of the mouse chromosome 16 region of conserved synteny with the human velocardiofacial syndrome region on chromosome 22q11.2. , 2000, Genomics.

[52]  S. Salzberg,et al.  Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39. , 2000, Nucleic acids research.

[53]  D R Bentley,et al.  Long-range comparison of human and mouse SCL loci: localized regions of sensitivity to restriction endonucleases correspond precisely with peaks of conserved noncoding sequences. , 2001, Genome research.

[54]  B. Berger,et al.  Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction , 2000 .

[55]  M. Goodman,et al.  Embryonic ε and γ globin genes of a prosimian primate (Galago crassicaudatus): Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints , 1988 .

[56]  B. Roe,et al.  A region of mouse chromosome 16 is syntenic to the DiGeorge, velocardiofacial syndrome minimal critical region. , 1997, Genome research.

[57]  J. Weissenbach,et al.  Mechanisms of Evolution in Rickettsia conorii and R. prowazekii , 2001, Science.

[58]  W. James Kent,et al.  The Intronerator: exploring introns and alternative splicing in Caenorhabditis elegans , 2000, Nucleic Acids Res..

[59]  E. Koonin,et al.  Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. , 2000, Nucleic acids research.

[60]  R C Hardison,et al.  Software tools for analyzing pairwise alignments of long sequences. , 1991, Nucleic acids research.

[61]  Ian T. Paulsen,et al.  The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[62]  W. Miller,et al.  Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions. , 1999, Nucleic acids research.