Computing the Assignment of Orthologous Genes via Genome Rearrangement

The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics. Existing methods that assign orthologs based on the similarity between DNA or protein sequences may make erroneous assignments when sequence similarity does not clearly delineate the evolutionary relationship among genes of the same families. In this paper, we present a new approach to ortholog assignment that takes into account both sequence similarity and evolutionary events at genome level, where orthologous genes are assumed to correspond to each other in the most parsimonious evolving scenario under genome rearrangement. It is then formulated as a problem of computing the signed reversal distance with duplicates between two genomes of interest, for which an efcient heuristic algorithm was given by introducing two new optimization problems, minimum common partition and maximum cycle decomposition. Following this approach, we have implemented a high-throughput system for assigning orthologs on a genome scale, called SOAR, and tested it on both simulated data and real genome sequence data. Compared to a recent ortholog assignment method based entirely on homology search (called INPARANOID), SOAR shows a marginally better performance in terms of sensitivity on the real data set because it was able to identify several correct orthologous pairs that were missed by INPARANOID. The simulation results demonstrate that SOAR in general performs better than the iterated exemplar algorithm in terms of computing the reversal distance and assigning correct orthologs.

[1]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[2]  Kun-Mao Chao,et al.  Linear-space algorithms that build local alignments from fragments , 1995, Algorithmica.

[3]  Erik L. L. Sonnhammer,et al.  Automated ortholog inference from phylogenetic trees and calculation of orthology reliability , 2002, Bioinform..

[4]  Petr Kolman,et al.  Minimum Common String Partition Problem: Hardness and Approximations , 2004, Electron. J. Comb..

[5]  Mathew W. Wright,et al.  Guidelines for human gene nomenclature. , 2002, Genomics.

[6]  Nadia El-Mabrouk,et al.  Reconstructing an ancestral genome using minimum segments duplications and reversals , 2002, J. Comput. Syst. Sci..

[7]  David A. Bader,et al.  A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study , 2001, J. Comput. Biol..

[8]  Nevin D. Young,et al.  OrthoParaMap: Distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies , 2003, BMC Bioinformatics.

[9]  Jijun Tang,et al.  Phylogenetic Reconstruction from Gene-Rearrangement Data with Unequal Gene Content , 2003, WADS.

[10]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[11]  Martin Vingron,et al.  Towards detection of orthologues in sequence databases , 1998, Bioinform..

[12]  Haim Kaplan,et al.  Faster and simpler algorithm for sorting signed permutations by reversals , 1997, SODA '97.

[13]  Alberto Caprara,et al.  Sorting Permutations by Reversals and Eulerian Cycle Decompositions , 1999, SIAM J. Discret. Math..

[14]  P. Pevzner,et al.  Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. , 2003, Genome research.

[15]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[16]  David Sankoff,et al.  Genome rearrangement with gene families , 1999, Bioinform..

[17]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[18]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[19]  E. Koonin,et al.  Orthology, paralogy and proposed classification for paralog subtypes. , 2002, Trends in genetics : TIG.

[20]  Robert W. Irving,et al.  Sorting Strings by Reversals and by Transpositions , 2001, SIAM J. Discret. Math..