Inferring Positional Homologs with Common Intervals of Sequences

Inferring orthologous and paralogous genes is an important problem in whole genomes comparisons, both for functional or evolutionary studies. In this paper, we introduce a new approach for inferring candidate pairs of orthologous genes between genomes, also called positional homologs, based on the conservation of the genomic context. We consider genomes represented by their gene order – i.e. sequences of signed integers – and common intervals of these sequences as the anchors of the final gene matching. We show that the natural combinatorial problem of computing a maximal cover of the two genomes using the minimum number of common intervals is NP-complete and we give a simple heuristic for this problem. We illustrate the effectiveness of this first approach using common intervals of sequences on two datasets, respectively 8 γ-proteobacterial genomes and the human and mouse whole genomes.

[1]  Tao Jiang,et al.  A Parsimony Approach to Genome-Wide Ortholog Assignment , 2006, RECOMB.

[2]  Cedric Chauve,et al.  Genes Order and Phylogenetic Reconstruction: Application to -Proteobacteria , 2005 .

[3]  Xin Chen,et al.  Assignment of orthologous genes via genome rearrangement , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Xin He,et al.  Identifying Conserved Gene Clusters in the Presence of Homology Families , 2005, J. Comput. Biol..

[5]  Stephen Gilmore,et al.  Combining Measurement and Stochastic Modelling to Enhance Scheduling Decisions for a Parallel Mean Value Analysis Algorithm , 2006, International Conference on Computational Science.

[6]  Krister M. Swenson,et al.  Approximating the true evolutionary distance between two genomes , 2008, JEAL.

[7]  S. Cannon,et al.  DiagHunter and GenoPix2D: programs for genomic comparisons, large-scale homology discovery and visualization , 2003, Genome Biology.

[8]  D. Bryant The Complexity of Calculating Exemplar Distances , 2000 .

[9]  Elisabeth R. M. Tillier,et al.  Positional Homology in Bacterial Genomes , 2006, Evolutionary bioinformatics online.

[10]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[11]  Krister M. Swenson,et al.  Approximating the true evolutionary distance between genomes , 2005 .

[12]  Guillaume Fertin,et al.  Genomes Containing Duplicates Are Hard to Compare , 2006, International Conference on Computational Science.

[13]  Andrés Moya,et al.  Genome Rearrangement Distances and Gene Order Phylogeny in γ-Proteobacteria , 2005 .

[14]  Steven Salzberg,et al.  DAGchainer: a tool for mining segmental genome duplications and synteny , 2004, Bioinform..

[15]  Kevin P. Byrne,et al.  The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species. , 2005, Genome research.

[16]  Dannie Durand,et al.  The Incompatible Desiderata of Gene Cluster Properties , 2005, Comparative Genomics.

[17]  Jens Stoye,et al.  Quadratic Time Algorithms for Finding Common Intervals in Two and More Sequences , 2004, CPM.

[18]  Krister M. Swenson,et al.  A Framework for Orthology Assignment from Gene Rearrangement Data , 2005, Comparative Genomics.

[19]  C. Hutchison,et al.  Essential genes of a minimal bacterium. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Petr Kolman,et al.  Minimum Common String Partition Problem: Hardness and Approximations , 2004, Electron. J. Comb..

[21]  Nadia El-Mabrouk,et al.  Maximizing Synteny Blocks to Identify Ancestral Homologs , 2005, Comparative Genomics.

[22]  Romeo Rizzi,et al.  Conserved Interval Distance Computation Between Non-trivial Genomes , 2005, COCOON.

[23]  D. Sankoff,et al.  Comparative Genomics: "Empirical And Analytical Approaches To Gene Order Dynamics, Map Alignment And The Evolution Of Gene Families" , 2000 .

[24]  David Sankoff,et al.  Genome rearrangement with gene families , 1999, Bioinform..

[25]  G. Blin,et al.  The breakpoint distance for signed sequences , 2005 .

[26]  Guillaume Fertin,et al.  Genes Order and Phylogenetic Reconstruction: Application to gamma-Proteobacteria , 2005, Comparative Genomics.

[27]  N. Moran,et al.  From Gene Trees to Organismal Phylogeny in Prokaryotes:The Case of the γ-Proteobacteria , 2003, PLoS biology.