Phylogenetic distances are encoded in networks of interacting pathways

MOTIVATION Although metabolic reactions are unquestionably shaped by evolutionary processes, the degree to which the overall structure and complexity of their interconnections are linked to the phylogeny of species has not been evaluated in depth. Here, we apply an original metabolome representation, termed Network of Interacting Pathways or NIP, with a combination of graph theoretical and machine learning strategies, to address this question. NIPs compress the information of the metabolic network exhibited by a species into much smaller networks of overlapping metabolic pathways, where nodes are pathways and links are the metabolites they exchange. RESULTS Our analysis shows that a small set of descriptors of the structure and complexity of the NIPs combined into regression models reproduce very accurately reference phylogenetic distances derived from 16S rRNA sequences (10-fold cross-validation correlation coefficient higher than 0.9). Our method also showed better scores than previous work on metabolism-based phylogenetic reconstructions, as assessed by branch distances score, topological similarity and second cousins score. Thus, our metabolome representation as network of overlapping metabolic pathways captures sufficient information about the underlying evolutionary events leading to the formation of metabolic networks and species phylogeny. It is important to note that precise knowledge of all of the reactions in these pathways is not required for these reconstructions. These observations underscore the potential for the use of abstract, modular representations of metabolic reactions as tools in studying the evolution of species. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[2]  Andrey A Mironov,et al.  A metabolic network in the evolutionary context: multiscale structure and modularity. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Byoung-Tak Zhang,et al.  Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks , 2006, BMC Bioinformatics.

[4]  Christian V. Forst,et al.  Algebraic comparison of metabolic networks, phylogenetic inference, and metabolic innovation , 2006, BMC Bioinformatics.

[5]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[6]  D. Penny,et al.  The Use of Tree Comparison Metrics , 1985 .

[7]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[8]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[10]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[11]  Sen Zhang,et al.  Unordered tree mining with applications to phylogeny , 2004, Proceedings. 20th International Conference on Data Engineering.

[12]  J. Wishart Statistical tables , 2018, Global Education Monitoring Report.

[13]  Runsheng Chen,et al.  Phylophenetic properties of metabolic pathway topologies as revealed by global analysis , 2006, BMC Bioinformatics.

[14]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[15]  T. Y. Kim,et al.  Phylogenetic analysis based on genome-scale metabolic pathway reaction content , 2004, Applied Microbiology and Biotechnology.

[16]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[17]  Ferenc Jordán,et al.  A network perspective on the topological importance of enzymes and their phylogenetic conservation , 2007, BMC Bioinformatics.

[18]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[19]  Kenji Satou,et al.  Phylogenetic reconstruction from non-genomic data , 2007, Bioinform..

[20]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[21]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[22]  Danail Bonchev,et al.  Quantitative Measures of Network Complexity , 2005 .

[23]  P. Bork,et al.  Measuring genome evolution. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Guy Perrière,et al.  The European ribosomal RNA database , 2004, Nucleic Acids Res..

[25]  R. Overbeek,et al.  The winds of (evolutionary) change: breathing new life into microbiology. , 1996, Journal of bacteriology.

[26]  Li Liao,et al.  Genome Comparisons Based on Profiles of Metabolic Pathways , 2002 .

[27]  Claude E. Shannon,et al.  Recent Contributions to The Mathematical Theory of Communication , 2009 .

[28]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[29]  Ambuj K. Singh,et al.  Deriving phylogenetic trees from the similarity analysis of metabolic pathways , 2003, ISMB.

[30]  N. N. Voront︠s︡ov,et al.  The Use of Tree Comparison Metrics , 1985 .

[31]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[32]  Petter Holme,et al.  Subnetwork hierarchies of biochemical pathways , 2002, Bioinform..

[33]  R. A. Fisher,et al.  Statistical Tables for Biological, Agricultural and Medical Research , 1956 .

[34]  A. Zeng,et al.  Phylogenetic comparison of metabolic capacities of organisms at genome level. , 2004, Molecular phylogenetics and evolution.

[35]  David Posada,et al.  MODELTEST: testing the model of DNA substitution , 1998, Bioinform..

[36]  L. Tippett Statistical Tables: For Biological, Agricultural and Medical Research , 1954 .

[37]  D. Bonchev,et al.  Complexity in chemistry, biology, and ecology , 2005 .

[38]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[39]  Bernhard O Palsson,et al.  Hierarchical thinking in network biology: the unbiased modularization of biochemical networks. , 2004, Trends in biochemical sciences.

[40]  K. Schulten,et al.  Phylogenetic Analysis of Metabolic Pathways , 2001, Journal of Molecular Evolution.

[41]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[42]  An-Ping Zeng,et al.  Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms , 2003, Bioinform..

[43]  Benno Schwikowski,et al.  Graph-based methods for analysing networks in cell biology , 2006, Briefings Bioinform..

[44]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[45]  Frank Harary,et al.  Graph Theory , 2016 .

[46]  Anat Kreimer,et al.  The evolution of modularity in bacterial metabolic networks , 2008, Proceedings of the National Academy of Sciences.