CGAT: a comparative genome analysis tool for visualizing alignments in the analysis of complex evolutionary changes between closely related genomes

BackgroundThe recent accumulation of closely related genomic sequences provides a valuable resource for the elucidation of the evolutionary histories of various organisms. However, although numerous alignment calculation and visualization tools have been developed to date, the analysis of complex genomic changes, such as large insertions, deletions, inversions, translocations and duplications, still presents certain difficulties.ResultsWe have developed a comparative genome analysis tool, named CGAT, which allows detailed comparisons of closely related bacteria-sized genomes mainly through visualizing middle-to-large-scale changes to infer underlying mechanisms. CGAT displays precomputed pairwise genome alignments on both dotplot and alignment viewers with scrolling and zooming functions, and allows users to move along the pre-identified orthologous alignments. Users can place several types of information on this alignment, such as the presence of tandem repeats or interspersed repetitive sequences and changes in G+C contents or codon usage bias, thereby facilitating the interpretation of the observed genomic changes. In addition to displaying precomputed alignments, the viewer can dynamically calculate the alignments between specified regions; this feature is especially useful for examining the alignment boundaries, as these boundaries are often obscure and can vary between programs. Besides the alignment browser functionalities, CGAT also contains an alignment data construction module, which contains various procedures that are commonly used for pre- and post-processing for large-scale alignment calculation, such as the split-and-merge protocol for calculating long alignments, chaining adjacent alignments, and ortholog identification. Indeed, CGAT provides a general framework for the calculation of genome-scale alignments using various existing programs as alignment engines, which allows users to compare the outputs of different alignment programs. Earlier versions of this program have been used successfully in our research to infer the evolutionary history of apparently complex genome changes between closely related eubacteria and archaea.ConclusionCGAT is a practical tool for analyzing complex genomic changes between closely related genomes using existing alignment programs and other sequence analysis tools combined with extensive manual inspection.

[1]  W. J. Kent,et al.  Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. , 2000, Genome research.

[2]  Runsheng Chen,et al.  GenomeComp: a visualization tool for microbial genome comparison. , 2003, Journal of microbiological methods.

[3]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[4]  I Uchiyama,et al.  Insertion with long target duplication: a mechanism for gene mobility suggested from comparison of two related bacterial genomes. , 2000, Gene.

[5]  Hwan-Gue Cho,et al.  GAME: A simple and efficient whole genome alignment method using maximal exact match filtering , 2005, Comput. Biol. Chem..

[6]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[7]  A. Goffeau,et al.  The complete genome sequence of the Gram-positive bacterium Bacillus subtilis , 1997, Nature.

[8]  Lior Pachter,et al.  VISTA: computational tools for comparative genomics , 2004, Nucleic Acids Res..

[9]  Ikuo Uchiyama,et al.  Genome comparison in silico in Neisseria suggests integration of filamentous bacteriophages by their own transposase. , 2005, DNA research : an international journal for rapid publication of reports on genes and genomes.

[10]  Benjamin L. King,et al.  Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori , 1999, Nature.

[11]  M. Hattori,et al.  Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. , 2001, DNA research : an international journal for rapid publication of reports on genes and genomes.

[12]  Ulrich Dobrindt,et al.  Genomic islands in pathogenic and environmental microorganisms , 2004, Nature Reviews Microbiology.

[13]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[14]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[15]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[16]  Patricia Siguier,et al.  ISfinder: the reference centre for bacterial insertion sequences , 2005, Nucleic Acids Res..

[17]  Kathryn A. Eaton,et al.  Switching of Flagellar Motility in Helicobacter pyloriby Reversible Length Variation of a Short Homopolymeric Sequence Repeat in fliP, a Gene Encoding a Basal Body Protein , 2000, Infection and Immunity.

[18]  S. Salzberg,et al.  Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[19]  Nicholas L. Bray,et al.  AVID: A global alignment program. , 2003, Genome research.

[20]  E. Gilson,et al.  Palindromic units are part of a new bacterial interspersed mosaic element (BIME). , 1991, Nucleic acids research.

[21]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[22]  S Karlin,et al.  An efficient algorithm for identifying matches with errors in multiple long molecular sequences. , 1991, Journal of molecular biology.

[23]  Ikuo Uchiyama,et al.  How genomes rearrange: genome comparison within bacteria Neisseria suggests roles for mobile elements in formation of complex genome polymorphisms. , 2006, Gene.

[24]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[25]  Michael Brudno,et al.  Fast and sensitive multiple alignment of large genomic sequences , 2003, BMC Bioinformatics.

[26]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[27]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[28]  Alex van Belkum,et al.  Short-Sequence DNA Repeats in Prokaryotic Genomes , 1998, Microbiology and Molecular Biology Reviews.

[29]  Ikuo Uchiyama,et al.  Evolution of paralogous genes: Reconstruction of genome rearrangements through comparison of multiple genomes within Staphylococcus aureus. , 2006, Molecular biology and evolution.

[30]  J. Peden,et al.  Simple sequence repeats in the Helicobacter pylori genome , 1998, Molecular microbiology.

[31]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[32]  Ikuo Uchiyama,et al.  CGAT: Comparative Genome Analysis Tool for Closely Related Microbial Genomes , 2000 .

[33]  D R Bentley,et al.  Long-range comparison of human and mouse SCL loci: localized regions of sensitivity to restriction endonucleases correspond precisely with peaks of conserved noncoding sequences. , 2001, Genome research.

[34]  Ikuo Uchiyama,et al.  MBGD: microbial genome database for comparative analysis , 2003, Nucleic Acids Res..

[35]  김동규,et al.  [서평]「Algorithms on Strings, Trees, and Sequences」 , 2000 .

[36]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[37]  Jens Stoye,et al.  Benchmarking tools for the alignment of functional noncoding DNA , 2004, BMC Bioinformatics.

[38]  S Karlin,et al.  Codon usages in different gene classes of the Escherichia coli genome , 1998, Molecular microbiology.

[39]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Matthew Berriman,et al.  ACT: the Artemis comparison tool , 2005, Bioinform..

[41]  Leping Li,et al.  Accurate anchoring alignment of divergent sequences , 2006, Bioinform..

[42]  G. Gutman,et al.  Slipped-strand mispairing: a major mechanism for DNA sequence evolution. , 1987, Molecular biology and evolution.

[43]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[44]  Kim Rutherford,et al.  Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18 , 2001, Nature.

[45]  Mark Borodovsky,et al.  The complete genome sequence of the gastric pathogen Helicobacter pylori , 1997, Nature.

[46]  R. Bone Discovery , 1938, Nature.

[47]  Ikuo Uchiyama,et al.  Thermoadaptation trait revealed by the genome sequence of thermophilic Geobacillus kaustophilus. , 2004, Nucleic acids research.

[48]  R. Durbin,et al.  Alfresco--a workbench for comparative genomic sequence analysis. , 2000, Genome research.

[49]  Jean-Michel Claverie,et al.  Information Enhancement Methods for Large Scale Sequence Analysis , 1993, Comput. Chem..

[50]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[51]  I Uchiyama,et al.  Comparison between Pyrococcus horikoshii and Pyrococcus abyssi genome sequences reveals linkage of restriction-modification genes with large genome polymorphisms. , 2000, Gene.

[52]  R. Durbin,et al.  Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. , 1999, Genome research.

[53]  N. Moran,et al.  Tracing the evolution of gene loss in obligate bacterial symbionts. , 2003, Current opinion in microbiology.

[54]  David A. Nix,et al.  GATA: a graphic alignment tool for comparative sequence analysis , 2005, BMC Bioinformatics.

[55]  R Palacios,et al.  Gene amplification and genomic plasticity in prokaryotes. , 1997, Annual review of genetics.