M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species

BackgroundDue to recent advances in whole genome shotgun sequencing and assembly technologies, the financial cost of decoding an organism's DNA has been drastically reduced, resulting in a recent explosion of genomic sequencing projects. This increase in related genomic data will allow for in depth studies of evolution in closely related species through multiple whole genome comparisons.ResultsTo facilitate such comparisons, we present an interactive multiple genome comparison and alignment tool, M-GCAT, that can efficiently construct multiple genome comparison frameworks in closely related species. M-GCAT is able to compare and identify highly conserved regions in up to 20 closely related bacterial species in minutes on a standard computer, and as many as 90 (containing 75 cloned genomes from a set of 15 published enterobacterial genomes) in an hour. M-GCAT also incorporates a novel comparative genomics data visualization interface allowing the user to globally and locally examine and inspect the conserved regions and gene annotations.ConclusionM-GCAT is an interactive comparative genomics tool well suited for quickly generating multiple genome comparisons frameworks and alignments among closely related species. M-GCAT is freely available for download for academic and non-commercial use at: http://alggen.lsi.upc.es/recerca/align/mgcat/intro-mgcat.html.

[1]  Lior Pachter,et al.  MAVID: constrained ancestral alignment of multiple sequences. , 2003, Genome research.

[2]  Eduardo P C Rocha,et al.  Order and disorder in bacterial genomes. , 2004, Current opinion in microbiology.

[3]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[4]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[5]  Enno Ohlebusch,et al.  Efficient multiple genome alignment , 2002, ISMB.

[6]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[7]  Jill P. Mesirov,et al.  Human and mouse gene structure: comparative analysis and application to exon prediction , 2000, RECOMB '00.

[8]  David A. Nix,et al.  GATA: a graphic alignment tool for comparative sequence analysis , 2005, BMC Bioinformatics.

[9]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[10]  Serafim Batzoglou,et al.  The many faces of sequence alignment , 2005, Briefings Bioinform..

[11]  Robert Giegerich,et al.  GenAlyzer: interactive visualization of sequence similarities between entire genomes , 2004, Bioinform..

[12]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[13]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[14]  Inna Dubchak,et al.  Automated whole-genome multiple alignment of rat, mouse, and human. , 2004, Genome research.

[15]  Nikos Kyrpides,et al.  The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide , 2005, Nucleic Acids Res..

[16]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[17]  W. Miller,et al.  Mulan: multiple-sequence local alignment and visualization for studying function and evolution. , 2005, Genome research.

[18]  Leping Li,et al.  Accurate anchoring alignment of divergent sequences , 2006, Bioinform..

[19]  Christopher J. Lee,et al.  Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems , 2004, Bioinform..

[20]  Burkhard Morgenstern,et al.  DIALIGN: finding local similarities by multiple sequence alignment , 1998, Bioinform..

[21]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[22]  C. Notredame,et al.  Recent progress in multiple sequence alignment: a survey. , 2002, Pharmacogenomics.

[23]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[24]  S. Salzberg,et al.  Evidence for symmetric chromosomal inversions around the replication origin in bacteria , 2000, Genome Biology.

[25]  N. W. Davis,et al.  Genome sequence of enterohaemorrhagic Escherichia coli O157:H7 , 2001, Nature.

[26]  Webb Miller,et al.  EnteriX 2003: visualization tools for genome alignments of Enterobacteriaceae , 2003, Nucleic Acids Res..

[27]  Nicholas L. Bray,et al.  AVID: A global alignment program. , 2003, Genome research.

[28]  Kim Rutherford,et al.  Artemis: sequence visualization and annotation , 2000, Bioinform..

[29]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[30]  Matthew Berriman,et al.  ACT: the Artemis comparison tool , 2005, Bioinform..

[31]  Benjamin J. Raphael,et al.  A novel method for multiple alignment of sequences with repeated and shuffled elements. , 2004, Genome research.

[32]  Paweł Mackiewicz,et al.  Flip-flop around the origin and terminus of replication in prokaryotic genomes , 2001, Genome Biology.

[33]  Aaron E. Darling,et al.  GRIL: genome rearrangement and inversion locator , 2004, Bioinform..

[34]  Jitender S. Deogun,et al.  EMAGEN: An Efficient Approach to Multiple Whole Genome Alignment , 2004, APBC.

[35]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[36]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[37]  Mario Huerta,et al.  Identification of patterns in biological sequences at the ALGGEN server: PROMO and MALGEN , 2003, Nucleic Acids Res..

[38]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[39]  Enno Ohlebusch,et al.  An Applications-focused Review of Comparative Genomics Tools: Capabilities, Limitations and Future Challenges , 2003, Briefings Bioinform..

[40]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[41]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.