An alternative approach to multiple genome comparison

Genome comparison is now a crucial step for genome annotation and identification of regulatory motifs. Genome comparison aims for instance at finding genomic regions either specific to or in one-to-one correspondance between individuals/strains/species. It serves e.g. to pre-annotate a new genome by automatically transfering annotations from a known one. However, efficiency, flexibility and objectives of current methods do not suit the whole spectrum of applications, genome sizes and organizations. Innovative approaches are still needed. Hence, we propose an alternative way of comparing multiple genomes based on segmentation by similarity. In this framework, rather than being formulated as a complex optimization problem, genome comparison is seen as a segmentation question for which a single optimal solution can be found in almost linear time. We apply our method to analyse three strains of a virulent pathogenic bacteria, Ehrlichia ruminantium, and identify 92 new genes. We also find out that a substantial number of genes thought to be strain specific have potential orthologs in the other strains. Our solution is implemented in an efficient program, qod, equipped with a user-friendly interface, and enables the automatic transfer of annotations betwen compared genomes or contigs (Video in Supplementary Data). Because it somehow disregards the relative order of genomic blocks, qod can handle unfinished genomes, which due to the difficulty of sequencing completion may become an interesting characteristic for the future. Availabilty: http://www.atgc-montpellier.fr/qod.

[1]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[2]  P. Bork,et al.  Measuring genome evolution. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Gregory Kucherov,et al.  YASS: enhancing the sensitivity of DNA similarity search , 2005, Nucleic Acids Res..

[4]  Petra Mutzel,et al.  Computational Molecular Biology , 1996 .

[5]  Laura Serino,et al.  Genome-based approaches to develop vaccines against bacterial pathogens. , 2009, Vaccine.

[6]  Marie-France Sagot,et al.  Precise detection of rearrangement breakpoints in mammalian chromosomes , 2008, BMC Bioinformatics.

[7]  S. Schbath,et al.  Identification of DNA Motifs Implicated in Maintenance of Bacterial Core Genomes by Predictive Modeling , 2007, PLoS genetics.

[8]  Rino Rappuoli,et al.  Post‐genomic vaccine development , 2006, FEBS letters.

[9]  Fourie Joubert,et al.  The genome of the heartwater agent Ehrlichia ruminantium contains multiple tandem repeats of actively variable copy number. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  P. Pevzner,et al.  Computational Molecular Biology , 2000 .

[11]  Pavel A. Pevzner,et al.  Computational molecular biology : an algorithmic approach , 2000 .

[12]  G. Medley,et al.  An assessment of the economic impact of heartwater (Cowdria ruminantium infection) and its control in Zimbabwe. , 1999, Preventive veterinary medicine.

[13]  Francesca Chiaromonte,et al.  Regulatory potential scores from genome-wide three-way alignments of human, mouse, and rat. , 2004, Genome research.

[14]  C. Ferraz,et al.  Comparative Genomic Analysis of Three Strains of Ehrlichia ruminantium Reveals an Active Process of Genome Size Plasticity , 2006, Journal of bacteriology.

[15]  N. Perna,et al.  progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement , 2010, PloS one.

[16]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[17]  H. Tettelin,et al.  Identification of a Universal Group B Streptococcus Vaccine by Multiple Genome Screen , 2005, Science.

[18]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[20]  D. McKeever,et al.  Analysis of T-cell responses in cattle immunized against heartwater by vaccination with killed elementary bodies of Cowdria ruminantium , 1997, Infection and immunity.

[21]  Marie-Adèle Rajandream,et al.  Comparative genomics of the fungal pathogens Candida dubliniensis and Candida albicans. , 2009, Genome research.

[22]  Meriem El Karoui,et al.  Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops , 2005, BMC Bioinformatics.

[23]  D. Haussler,et al.  Reconstructing large regions of an ancestral mammalian genome in silico. , 2004, Genome research.

[24]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.