MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level

BackgroundThe recent availability of complete sequences for numerous closely related bacterial genomes opens up new challenges in comparative genomics. Several methods have been developed to align complete genomes at the nucleotide level but their use and the biological interpretation of results are not straightforward. It is therefore necessary to develop new resources to access, analyze, and visualize genome comparisons.DescriptionHere we present recent developments on MOSAIC, a generalist comparative bacterial genome database. This database provides the bacteriologist community with easy access to comparisons of complete bacterial genomes at the intra-species level. The strategy we developed for comparison allows us to define two types of regions in bacterial genomes: backbone segments (i.e., regions conserved in all compared strains) and variable segments (i.e., regions that are either specific to or variable in one of the aligned genomes). Definition of these segments at the nucleotide level allows precise comparative and evolutionary analyses of both coding and non-coding regions of bacterial genomes. Such work is easily performed using the MOSAIC Web interface, which allows browsing and graphical visualization of genome comparisons.ConclusionThe MOSAIC database now includes 493 pairwise comparisons and 35 multiple maximal comparisons representing 78 bacterial species. Genome conserved regions (backbones) and variable segments are presented in various formats for further analysis. A graphical interface allows visualization of aligned genomes and functional annotations. The MOSAIC database is available online at http://genome.jouy.inra.fr/mosaic.

[1]  G. Pupo,et al.  Multiple independent origins of Shigella clones of Escherichia coli and convergent evolution of many of their characteristics. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Ingmar Reuter,et al.  Integr8 and Genome Reviews: integrated views of complete genomes and proteomes , 2004, Nucleic Acids Res..

[3]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[4]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[5]  David S. Wishart,et al.  Circular genome visualization and exploration using CGView , 2005, Bioinform..

[6]  Xavier Messeguer,et al.  M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species , 2006, BMC Bioinformatics.

[7]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[8]  Meriem El Karoui,et al.  Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops , 2005, BMC Bioinformatics.

[9]  Webb Miller,et al.  Comparison of genomic DNA sequences: solved and unsolved problems , 2001, Bioinform..

[10]  Mark Hoebeke,et al.  MuGeN: Simultaneous Exploration of Multiple Genomes and Computer Analysis Results , 2003, Bioinform..

[11]  Claudine Médigue,et al.  Annotation, comparison and databases for hundreds of bacterial genomes. , 2007, Research in microbiology.

[12]  Webb Miller,et al.  EnteriX 2003: visualization tools for genome alignments of Enterobacteriaceae , 2003, Nucleic Acids Res..

[13]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[14]  Mark J. Pallen,et al.  xBASE2: a comprehensive resource for comparative bacterial genomics , 2007, Nucleic Acids Res..

[15]  Enno Ohlebusch,et al.  Efficient multiple genome alignment , 2002, ISMB.

[16]  S. Schbath,et al.  Identification of DNA Motifs Implicated in Maintenance of Bacterial Core Genomes by Predictive Modeling , 2007, PLoS genetics.