The mathematics of xenology: di-cographs, symbolic ultrametrics, 2-structures and tree-representable systems of binary relations

The concepts of orthology, paralogy, and xenology play a key role in molecular evolution. Orthology and paralogy distinguish whether a pair of genes originated by speciation or duplication. The corresponding binary relations on a set of genes form complementary cographs. Allowing more than two types of ancestral event types leads to symmetric symbolic ultrametrics. Horizontal gene transfer, which leads to xenologous gene pairs, however, is inherent asymmetric since one offspring copy “jumps” into another genome, while the other continues to be inherited vertically. We therefore explore here the mathematical structure of the non-symmetric generalization of symbolic ultrametrics. Our main results tie non-symmetric ultrametrics together with di-cographs (the directed generalization of cographs), so-called uniformly non-prime () 2-structures, and hierarchical structures on the set of strong modules. This yields a characterization of relation structures that can be explained in terms of trees and types of ancestral events. This framework accommodates a horizontal-transfer relation in terms of an ancestral event and thus, is slightly different from the the most commonly used definition of xenology. As a first step towards a practical use, we present a simple polynomial-time recognition algorithm of 2-structures and investigate the computational complexity of several types of editing problems for 2-structures. We show, finally that these NP-complete problems can be solved exactly as Integer Linear Programs.

[1]  Andrzej Ehrenfeucht,et al.  An O(n²) Divide-and-Conquer Algorithm for the Prime Tree Decomposition of Two-Structures and Modular Decomposition of Graphs , 1994, J. Algorithms.

[2]  A. Brandstädt,et al.  Graph Classes: A Survey , 1987 .

[3]  Daniel Doerr,et al.  Orthology Detection Combining Clustering and Synteny for Very Large Datasets , 2014, PloS one.

[4]  Nadia El-Mabrouk,et al.  Orthology Relation and Gene Tree Correction: Complexity Results , 2015, WABI.

[5]  Tero Harju,et al.  Characterization and Complexity of Uniformly Non Primitive Labeled 2-Structures , 1996, Theor. Comput. Sci..

[6]  R. Jensen Orthologs and paralogs - we need to get it right , 2001, Genome Biology.

[7]  Derek G. Corneil,et al.  Complement reducible graphs , 1981, Discret. Appl. Math..

[8]  Andrzej Ehrenfeucht,et al.  Primitivity is Hereditary for 2-Structures , 1990, Theor. Comput. Sci..

[9]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[10]  Riccardo Dondi,et al.  Correction of Weighted Orthology and Paralogy Relations - Complexity and Algorithmic Results , 2016, WABI.

[11]  Nicolas Wieseke,et al.  From Sequence Data Including Orthologs, Paralogs, and Xenologs to Gene and Species Trees , 2016 .

[12]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[13]  Martin Middendorf,et al.  Phylogenomics with paralogs , 2015, Proceedings of the National Academy of Sciences.

[14]  Christophe Paul,et al.  Fully dynamic recognition algorithm and certificate for directed cographs , 2006, Discret. Appl. Math..

[15]  J. Palmer,et al.  Horizontal gene transfer in eukaryotic evolution , 2008, Nature Reviews Genetics.

[16]  Ross M. McConnell AnO(n2) incremental algorithm for modular decomposition of graphs and 2-structures , 2005, Algorithmica.

[17]  Katharina T. Huber,et al.  Orthology relations, symbolic ultrametrics, and cographs , 2013, Journal of mathematical biology.

[18]  William T. Trotter,et al.  Critically indecomposable partially ordered sets, graphs, tournaments and other binary relational structures , 1993, Discret. Math..

[19]  Adrian M. Altenhoff,et al.  Standardized benchmarking in the quest for orthologs , 2016, Nature Methods.

[20]  Andrzej Ehrenfeucht,et al.  Theory of 2-Structures, Part II: Representation Through Labeled Tree Families , 1990, Theor. Comput. Sci..

[21]  Tero Harju,et al.  Structure and organization , 2014 .

[22]  Jinling Huang,et al.  Horizontal gene transfer: building the web of life , 2015, Nature Reviews Genetics.

[23]  Nicolas Wieseke,et al.  On Symbolic Ultrametrics, Cotree Representations, and Cograph Edge Decompositions and Partitions , 2015, COCOON.

[24]  Riccardo Dondi,et al.  The link between orthology relations and gene trees: a correction perspective , 2016, Algorithms for Molecular Biology.

[25]  Andreas W. M. Dress,et al.  Recovering Symbolically Dated, Rooted Trees from Symbolic Ultrametrics , 1998 .

[26]  J. Lagergren,et al.  Probabilistic orthology analysis. , 2009, Systematic biology.

[27]  Zoltán Fülöp,et al.  Automata, languages and programming : 22nd International Colloquium, ICALP 95, Szeged, Hungary, July 10-14, 1995 : proceedings , 1995 .

[28]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[29]  Sonja J. Prohaska,et al.  Proteinortho: Detection of (Co-)orthologs in large-scale analysis , 2011, BMC Bioinformatics.

[30]  R. Möhring Algorithmic aspects of the substitution decomposition in optimization over relations, set systems and Boolean functions , 1985 .

[31]  E. Koonin,et al.  Horizontal gene transfer in prokaryotes: quantification and classification. , 2001, Annual review of microbiology.

[32]  Eugene L. Lawler,et al.  The recognition of Series Parallel digraphs , 1979, SIAM J. Comput..

[33]  Nicolas Wieseke,et al.  On tree representations of relations and graphs: symbolic ultrametrics and cograph edge decompositions , 2015, J. Comb. Optim..

[34]  Christophe Paul,et al.  Fully dynamic recognition algorithm and certificate for directed cographs , 2004, Graph-Theoretic Concepts in Computer Science.

[35]  Andrzej Ehrenfeucht,et al.  Theory of 2-Structures, Part I: Clans, Basic Subclasses, and Morphisms , 1990, Theor. Comput. Sci..

[36]  Tero Harju,et al.  The Theory of 2-Structures - A Framework for Decomposition and Transformation of Graphs , 1997, Handbook of Graph Grammars.

[37]  Nicolas Wieseke,et al.  Construction of Gene and Species Trees from Sequence Data incl. Orthologs, Paralogs, and Xenologs , 2016, ArXiv.

[38]  F. Radermacher,et al.  Substitution Decomposition for Discrete Structures and Connections with Combinatorial Optimization , 1984 .

[39]  W. Fitch,et al.  Evolution of antibiotic resistance genes: the DNA sequence of a kanamycin resistance gene from Staphylococcus aureus. , 1983, Molecular biology and evolution.

[40]  Ross M. McConnell,et al.  Linear-time modular decomposition of directed graphs , 2005, Discret. Appl. Math..

[41]  W. Fitch Homology a personal view on some of the problems. , 2000, Trends in genetics : TIG.

[42]  Roberto Grossi,et al.  Circular sequence comparison: algorithms and applications , 2016, Algorithms for Molecular Biology.

[43]  Katharina T. Huber,et al.  From event-labeled gene trees to species trees , 2012, BMC Bioinformatics.

[44]  Peter F. Stadler,et al.  Techniques for the Cograph Editing Problem: Module Merge is equivalent to Editing P4s , 2015, ArXiv.