OMG! Orthologs in Multiple Genomes - Competing Graph-Theoretical Formulations

From the set of all pairwise homologies, weighted by sequence similarities, among a set of genomes, we seek disjoint orthology sets of genes, in which each element is orthogonal to all other genes (on a different genome) in the same set. In a graph-theoretical formulation, where genes are vertices and weighted edges represent homologies, we suggest three criteria, with three different biological motivations, for evaluating the partition of genes produced by deletion of a subset of edges: i) minimum weight edge removal, ii) minimum degree-zero vertex creation, and iii) maximum number of edges in the transitive closure of the graph after edge deletion. For each of the problems, all either proved or conjectured to be NP-hard, we suggest approximate and heuristic algorithms of finding orthology sets satisfying the criteria, and show how to incorporate genomes that have a whole genome duplication event in their immediate lineage. We apply this to ten flowering plant genomes, involving 160,000 different genes in given pairwise homologies. We evaluate the results in a number of ways and recommend criterion iii) as best suited to applications to multiple gene order alignment.

[1]  Oliver Eulenstein,et al.  Bioinformatics Research and Applications , 2008 .

[2]  M. Freeling,et al.  How to usefully compare homologous plant genes and chromosomes as DNA sequences. , 2008, The Plant journal : for cell and molecular biology.

[3]  Brent S. Pedersen,et al.  Screening synteny blocks in pairwise genome comparisons through integer programming , 2011, BMC Bioinformatics.

[4]  David C. Tank,et al.  An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: , 2009 .

[5]  Piet Demeester,et al.  A greedy, graph-based algorithm for the alignment of multiple homologous gene lists , 2011, Bioinform..

[6]  David Sankoff,et al.  Gene order in rosid phylogeny, inferred from pairwise syntenies among extant genomes , 2012, BMC Bioinformatics.

[7]  Jiping Liu,et al.  Approximation Algorithms for Some Graph Partitioning Problems , 2000, J. Graph Algorithms Appl..

[8]  Haibao Tang,et al.  Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups Papaya, Poplar, and Grape: CoGe with Rosids1[W] , 2008, Plant Physiology.

[9]  Stephen Warshall,et al.  A Theorem on Boolean Matrices , 1962, JACM.

[10]  Christopher P Austin,et al.  Prepublication data sharing , 2009, Nature.

[11]  Henry D. Priest,et al.  The genome of woodland strawberry (Fragaria vesca) , 2011, Nature Genetics.

[12]  Frédéric Boyer,et al.  Bacterial syntenies: an exact approach with gene quorum , 2011, BMC Bioinformatics.