Assessing the Robustness of Complete Bacterial Genome Segmentations

Comparison of closely related bacterial genomes has revealed the presence of highly conserved sequences forming a "backbone" that is interrupted by numerous, less conserved, DNA fragments. Segmentation of bacterial genomes into backbone and variable regions is particularly useful to investigate bacterial genome evolution. Several software tools have been designed to compare complete bacterial chromosomes and a few online databases store pre-computed genome comparisons. However, very few statistical methods are available to evaluate the reliability of these software tools and to compare the results obtained with them. To fill this gap, we have developed two local scores to measure the robustness of bacterial genome segmentations. Our method uses a simulation procedure based on random perturbations of the compared genomes. The scores presented in this paper are simple to implement and our results show that they allow to discriminate easily between robust and non-robust bacterial genome segmentations when using aligners such as MAUVE and MGA.

[1]  Lior Pachter,et al.  The computational challenges of applying comparative-based computational methods to whole genomes , 2002, Briefings Bioinform..

[2]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[3]  K. Kurokawa,et al.  Analysis of invariant sequences in 266 complete genomes. , 2007, Gene.

[4]  W. Miller,et al.  Mulan: multiple-sequence local alignment and visualization for studying function and evolution. , 2005, Genome research.

[5]  Hiroaki Kitano,et al.  Biological robustness , 2008, Nature Reviews Genetics.

[6]  S. Schbath,et al.  Identification of DNA Motifs Implicated in Maintenance of Bacterial Core Genomes by Predictive Modeling , 2007, PLoS genetics.

[7]  A. Danchin,et al.  Organised Genome Dynamics in the Escherichia coli Species Results in Highly Diverse Adaptive Paths , 2009, PLoS genetics.

[8]  Inna Dubchak,et al.  Multiple whole-genome alignments without a reference organism. , 2009, Genome research.

[9]  Colin N. Dewey,et al.  Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. , 2007, Genome research.

[10]  Mark J. Pallen,et al.  xBASE, a collection of online databases for bacterial comparative genomics , 2005, Nucleic Acids Res..

[11]  Claudine Médigue,et al.  Small variable segments constitute a major type of diversity of bacterial genomes at the species level , 2010, Genome Biology.

[12]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[13]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[14]  Ghislain Fournous,et al.  The impact of prophages on bacterial chromosomes , 2004, Molecular microbiology.

[15]  Meriem El Karoui,et al.  Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops , 2005, BMC Bioinformatics.

[16]  M. Prentice,et al.  Bacterial comparative genomics , 2004, Genome Biology.

[17]  Enno Ohlebusch,et al.  Efficient multiple genome alignment , 2002, ISMB.

[18]  Enno Ohlebusch,et al.  CoCoNUT: an efficient system for the comparison and analysis of genomes , 2008, BMC Bioinformatics.

[19]  Christophe Caron,et al.  MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level , 2008, BMC Bioinformatics.

[20]  Alain Guénoche,et al.  Comparing bacterial genomes from linear orders of patterns , 2008, Discret. Appl. Math..

[21]  M. Hattori,et al.  Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. , 2001, DNA research : an international journal for rapid publication of reports on genes and genomes.

[22]  Xavier Messeguer,et al.  M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species , 2006, BMC Bioinformatics.

[23]  Mark Hoebeke,et al.  MuGeN: Simultaneous Exploration of Multiple Genomes and Computer Analysis Results , 2003, Bioinform..

[24]  A. Prakash,et al.  Measuring the accuracy of genome-size multiple alignments , 2007, Genome Biology.

[25]  Meriem El Karoui,et al.  A Genomic Distance Based on MUM Indicates Discontinuity between Most Bacterial Species and Genera , 2008, Journal of bacteriology.

[26]  Ron Shamir,et al.  Assessing the Quality of Whole Genome Alignments in Bacteria , 2009, Adv. Bioinformatics.

[27]  Yi Xing,et al.  Negative selection pressure against premature protein truncation is reduced by both alternative splicing and diploidy , 2004, Genome Biology.

[28]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[29]  Webb Miller,et al.  Comparison of genomic DNA sequences: solved and unsolved problems , 2001, Bioinform..

[30]  Hélène Chiapello,et al.  FUNYBASE: a FUNgal phYlogenomic dataBASE , 2008, BMC Bioinformatics.