Evaluating synteny for improved comparative studies

Motivation: Comparative genomics aims to understand the structure and function of genomes by translating knowledge gained about some genomes to the object of study. Early approaches used pairwise comparisons, but today researchers are attempting to leverage the larger potential of multi-way comparisons. Comparative genomics relies on the structuring of genomes into syntenic blocks: blocks of sequence that exhibit conserved features across the genomes. Syntenic blocs are required for complex computations to scale to the billions of nucleotides present in many genomes; they enable comparisons across broad ranges of genomes because they filter out much of the individual variability; they highlight candidate regions for in-depth studies; and they facilitate whole-genome comparisons through visualization tools. However, the concept of syntenic block remains loosely defined. Tools for the identification of syntenic blocks yield quite different results, thereby preventing a systematic assessment of the next steps in an analysis. Current tools do not include measurable quality objectives and thus cannot be benchmarked against themselves. Comparisons among tools have also been neglected—what few results are given use superficial measures unrelated to quality or consistency. Results: We present a theoretical model as well as an experimental basis for comparing syntenic blocks and thus also for improving or designing tools for the identification of syntenic blocks. We illustrate the application of the model and the measures by applying them to syntenic blocks produced by three different contemporary tools (DRIMM-Synteny, i-ADHoRe and Cyntenator) on a dataset of eight yeast genomes. Our findings highlight the need for a well founded, systematic approach to the decomposition of genomes into syntenic blocks. Our experiments demonstrate widely divergent results among these tools, throwing into question the robustness of the basic approach in comparative genomics. We have taken the first step towards a formal approach to the construction of syntenic blocks by developing a simple quality criterion based on sound evolutionary principles. Contact: cristinagabriela.ghiurcuta@epfl.ch

[1]  Benedict Paten,et al.  Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment , 2009, Bioinform..

[2]  Todd J. Vision,et al.  Fast identification and statistical evaluation of segmental homologies in comparative maps , 2003, ISMB.

[3]  W. Fitch Homology a personal view on some of the problems. , 2000, Trends in genetics : TIG.

[4]  Jian Pei,et al.  OrthoCluster: a new tool for mining synteny blocks and applications in comparative genomics , 2008, EDBT '08.

[5]  E. Koonin,et al.  Functional and evolutionary implications of gene orthology , 2013, Nature Reviews Genetics.

[6]  Nikolay Vyahhi,et al.  Sibelia: A Scalable and Comprehensive Synteny Block Generation Tool for Closely Related Microbial Genomes , 2013, WABI.

[7]  P. Pevzner,et al.  Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. , 2003, Genome research.

[8]  Y. van de Peer,et al.  i-ADHoRe 3.0—fast and sensitive detection of genomic homology in extremely large data sets , 2011, Nucleic acids research.

[9]  Zanoni Dias,et al.  Cassis: detection of genomic rearrangement breakpoints , 2010, Bioinform..

[10]  W. Pearson Empirical statistical estimates for sequence similarity searches. , 1998, Journal of molecular biology.

[11]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[12]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[13]  Pavel A Pevzner,et al.  How to apply de Bruijn graphs to genome assembly. , 2011, Nature biotechnology.

[14]  N. Perna,et al.  progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement , 2010, PloS one.

[15]  Tao Jiang,et al.  MSOAR: A High-Throughput Ortholog Assignment System Based on Genome Rearrangement , 2007, J. Comput. Biol..

[16]  Jens Stoye,et al.  Common intervals and sorting by reversals: a marriage of necessity , 2002, ECCB.

[17]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[18]  C. Dieterich,et al.  CYNTENATOR: Progressive Gene Order Alignment of 17 Vertebrate Genomes , 2010, PloS one.

[19]  R. White Mapping human chromosomes. , 1984, Harvey lectures.

[20]  Pavel A. Pevzner,et al.  DRIMM-Synteny: decomposing genomes into evolutionary conserved segments , 2010, Bioinform..

[21]  Evgeny M. Zdobnov,et al.  OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011 , 2010, Nucleic Acids Res..

[22]  Evgeny M. Zdobnov,et al.  OrthoDB: the hierarchical catalog of eukaryotic orthologs , 2007, Nucleic Acids Res..

[23]  J. Raes,et al.  The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between Arabidopsis and rice. , 2002, Genome research.

[24]  Kevin P. Byrne,et al.  The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species. , 2005, Genome research.

[25]  Katharina Jahn Efficient Computation of Approximate Gene Clusters Based on Reference Occurrences , 2011, J. Comput. Biol..

[26]  P. Pevzner,et al.  Reconstructing the genomic architecture of ancestral mammals: lessons from human, mouse, and rat genomes. , 2004, Genome research.

[27]  J. Nadeau,et al.  Lengths of chromosomal segments conserved since divergence of man and mouse. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Michael S. Waterman,et al.  Computational Genome Analysis: An Introduction , 2007 .