Metabolic networks classification and knowledge discovery by information granulation

Graphs are powerful structures able to capture topological and semantic information from data, hence suitable for modelling a plethora of real-world (complex) systems. For this reason, graph-based pattern recognition gained a lot of attention in recent years. In this paper, a general-purpose classification system in the graphs domain is presented. When most of the information of the available patterns can be encoded in edge labels, an information granulation-based approach is highly discriminant and allows for the identification of semantically meaningful edges. The proposed classification system has been tested on the entire set of organisms (5299) for which metabolic networks are known, allowing for both a perfect mirroring of the underlying taxonomy and the identification of most discriminant metabolic reactions and pathways. The widespread diffusion of graph (network) structures in biology makes the proposed pattern recognition approach potentially very useful in many different fields of application. More specifically, the possibility to have a reliable metric to compare different metabolic systems is instrumental in emerging fields like microbiome analysis and, more in general, for proposing metabolic networks as a universal phenotype spanning the entire tree of life and in direct contact with environmental cues.

[1]  Péter Csermely,et al.  The efficiency of multi-target drugs: the network approach might help drug design. , 2004, Trends in pharmacological sciences.

[2]  Michael Y. Galperin,et al.  Phylogenomic reconstruction of archaeal fatty acid metabolism. , 2014, Environmental microbiology.

[3]  Alessandro Giuliani,et al.  Metabolic pathways variability and sequence/networks comparisons , 2006, BMC Bioinformatics.

[4]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[5]  A. Giuliani,et al.  Granular Computing Techniques for Bioinformatics Pattern Recognition Problems in Non-metric Spaces , 2018 .

[6]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[7]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[8]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[9]  Lorenzo Livi,et al.  Graph Recognition by Seriation and Frequent Substructures Mining , 2012, ICPRAM.

[10]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[11]  D. Tocher Metabolism and Functions of Lipids and Fatty Acids in Teleost Fish , 2003 .

[12]  Marc Ereshefsky Microbiology and the species problem , 2010 .

[13]  P. Heretsch,et al.  Rearranged ergostane-type natural products: chemistry, biology, and medicinal aspects. , 2019, Organic & biomolecular chemistry.

[14]  Robert P. W. Duin,et al.  Prototype selection for dissimilarity-based classifiers , 2006, Pattern Recognit..

[15]  P. Majerus,et al.  The Role of Phosphatases in Inositol Signaling Reactions* , 1999, Journal of Biological Chemistry.

[16]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[17]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[18]  Donovan H. Parks,et al.  A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life , 2018, Nature Biotechnology.

[19]  Simone Scardapane,et al.  Granular Computing Techniques for Classification and Semantic Characterization of Structured Data , 2015, Cognitive Computation.

[20]  Antonello Rizzi,et al.  Stochastic Information Granules Extraction for Graph Embedding and Classification , 2019, IJCCI.

[21]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[22]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[23]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[24]  M. Roberts Inositol in bacteria and archaea. , 2006, Sub-cellular biochemistry.

[25]  Witold Pedrycz,et al.  Building the fundamentals of granular computing: A principle of justifiable granularity , 2013, Appl. Soft Comput..

[26]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[27]  Lorenzo Livi,et al.  Granular computing, computational intelligence, and the analysis of non-geometric input spaces , 2016 .

[28]  Alessandro Giuliani,et al.  Protein–Protein Interactions: The Structural Foundation of Life Complexity , 2017 .

[29]  R. Mackenzie,et al.  Akt/PKB activation and insulin signaling: a novel insulin signaling pathway in the treatment of type 2 diabetes , 2014, Diabetes, metabolic syndrome and obesity : targets and therapy.

[30]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[31]  S. Wuchty Scale-free behavior in protein domain networks. , 2001, Molecular biology and evolution.

[32]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[33]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[34]  A. Giuliani,et al.  Emergent properties of gene evolution: Species as attractors in phenotypic space , 2012 .

[35]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[36]  Henry Lieberman,et al.  BrainSpace: Relating Neuroscience to Knowledge About Everyday Life , 2012, Cognitive Computation.

[37]  Guoyin Wang,et al.  Granular computing: from granularity optimization to multi-granularity joint problem solving , 2016, Granular Computing.

[38]  Vladik Kreinovich,et al.  Handbook of Granular Computing , 2008 .

[39]  P. Legendre,et al.  SPECIES ASSEMBLAGES AND INDICATOR SPECIES:THE NEED FOR A FLEXIBLE ASYMMETRICAL APPROACH , 1997 .

[40]  M. Berridge,et al.  The Inositol Trisphosphate/Calcium Signaling Pathway in Health and Disease. , 2016, Physiological reviews.

[41]  Witold Pedrycz,et al.  Granular computing: an introduction , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[42]  R. Milo,et al.  The biomass distribution on Earth , 2018, Proceedings of the National Academy of Sciences.

[43]  Patrice D Cani Human gut microbiome: hopes, threats and promises , 2018, Gut.

[44]  Lorenzo Livi,et al.  A new Granular Computing approach for sequences representation and classification , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[45]  James Kinross,et al.  The gut microbiota and host health: a new clinical frontier , 2015, Gut.

[46]  Yao,et al.  The rise of granular computing , 2008 .

[47]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[48]  S. Wasser Medicinal Mushroom Science: History, Current Status, Future Trends, and Unsolved Problems , 2010 .

[49]  Antonello Rizzi,et al.  Automatic Classification of Graphs by Symbolic Histograms , 2007 .

[50]  A. Knoll Paleobiological perspectives on early eukaryotic evolution. , 2014, Cold Spring Harbor perspectives in biology.

[51]  Aarash Bordbar,et al.  Elucidating dynamic metabolic physiology through network integration of quantitative time-course metabolomics , 2017, Scientific Reports.

[52]  Alessandro Giuliani,et al.  Why network approach can promote a new way of thinking in biology , 2014, Front. Genet..

[53]  Hugo F. Olivares-Rubio,et al.  Fatty acid metabolism in fish species as a biomarker for environmental monitoring. , 2016, Environmental pollution.

[54]  Antonello Rizzi,et al.  Supervised machine learning techniques and genetic optimization for occupational diseases risk prediction , 2019, Soft Computing.

[55]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[56]  Antonello Rizzi,et al.  Distance Matrix Pre-Caching and Distributed Computation of Internal Validation Indices in k-medoids Clustering , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[57]  J. Ferry Acetate Metabolism in Anaerobes from the Domain Archaea , 2015, Life.

[58]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[59]  J. Hohmann,et al.  Bioactivity-Guided Isolation of Antimicrobial and Antioxidant Metabolites from the Mushroom Tapinella atrotomentosa , 2018, Molecules.

[60]  P. Bork,et al.  The Human Gut Microbiome: From Association to Modulation , 2018, Cell.

[61]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[62]  C. Ukomadu,et al.  Interacting Proteins Dictate Function of the Minimal START Domain Phosphatidylcholine Transfer Protein/StarD2* , 2007, Journal of Biological Chemistry.

[63]  T. Schäfer,et al.  Acetyl-CoA synthetase (ADP forming) in archaea, a novel enzyme involved in acetate formation and ATP synthesis , 2004, Archives of Microbiology.

[64]  Robert A. Weinstein,et al.  Contamination, Disinfection, and Cross-Colonization: Are Hospital Surfaces Reservoirs for Nosocomial Infection? , 2004, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[65]  Masaru Tomita,et al.  Proteins as networks: usefulness of graph theory in protein science. , 2008, Current protein & peptide science.

[66]  A. Giuliani,et al.  Protein contact networks: an emerging paradigm in chemistry. , 2013, Chemical reviews.

[67]  Guoyin Wang,et al.  Knowledge distance measure in multigranulation spaces of fuzzy equivalence relations , 2018, Inf. Sci..

[68]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[69]  Alessandro Giuliani,et al.  Supervised Approaches for Function Prediction of Proteins Contact Networks from Topological Structure Information , 2017, SCIA.

[70]  Horst Bunke,et al.  On a relation between graph edit distance and maximum common subgraph , 1997, Pattern Recognit. Lett..

[71]  G. Wagner,et al.  The pleiotropic structure of the genotype–phenotype map: the evolvability of complex organisms , 2011, Nature Reviews Genetics.

[72]  Julio Saez-Rodriguez,et al.  BioServices: a common Python package to access biological Web Services programmatically , 2013, Bioinform..

[73]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[74]  Antonello Rizzi,et al.  Efficient Approaches for Solving the Large-Scale k-medoids Problem , 2017, IJCCI.

[75]  Dieter Lang,et al.  Predicting drug metabolism: experiment and/or computation? , 2015, Nature Reviews Drug Discovery.

[76]  Lorenzo Livi,et al.  The graph matching problem , 2012, Pattern Analysis and Applications.

[77]  Lorenzo Livi,et al.  A Granular Computing approach to the design of optimized graph classification systems , 2014, Soft Comput..

[78]  Andrew P. Martin Phylogenetic Approaches for Describing and Comparing the Diversity of Microbial Communities , 2002, Applied and Environmental Microbiology.

[79]  Prem Kumar Singh,et al.  Similar Vague Concepts Selection Using Their Euclidean Distance at Different Granulation , 2018, Cognitive Computation.

[80]  W. Martin Archaebacteria (Archaea) and the origin of the eukaryotic nucleus. , 2005, Current opinion in microbiology.

[81]  Bruce Alberts,et al.  Essential Cell Biology , 1983 .

[82]  J. T. Staley,et al.  Classification of Procaryotic Organisms and the Concept of Bacterial Speciation , 2015 .

[83]  J. Bull,et al.  Combining data in phylogenetic analysis. , 1996, Trends in ecology & evolution.

[84]  Horst Bunke,et al.  Bridging the Gap between Graph Edit Distance and Kernel Machines , 2007, Series in Machine Perception and Artificial Intelligence.

[85]  B. Bohannan,et al.  Microbial Biogeography: From Taxonomy to Traits , 2008, Science.

[86]  Yiyu Yao,et al.  A measurement theory view on the granularity of partitions , 2012, Inf. Sci..

[87]  Yiyu Yao A triarchic theory of granular computing , 2016 .

[88]  A. Giuliani,et al.  Functional essentiality from topology features in metabolic networks: A case study in yeast , 2005, FEBS letters.

[89]  A. Hipp,et al.  Congruence versus phylogenetic accuracy: revisiting the incongruence length difference test. , 2004, Systematic biology.

[90]  Antonello Rizzi,et al.  Dissimilarity Space Representations and Automatic Feature Selection for Protein Function Prediction , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[91]  A. Porras,et al.  HGF/c-Met signaling promotes liver progenitor cell migration and invasion by an epithelial-mesenchymal transition-independent, phosphatidyl inositol-3 kinase-dependent pathway in an in vitro model. , 2015, Biochimica et biophysica acta.

[92]  Antonello Rizzi,et al.  Efficient Approaches for Solving the Large-Scale k-Medoids Problem: Towards Structured Data , 2017, IJCCI.

[93]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[94]  Lorenzo Livi,et al.  Granular modeling and computing approaches for intelligent analysis of non-geometric data , 2015, Appl. Soft Comput..

[95]  Antonello Rizzi,et al.  A Novel Algorithm for Online Inexact String Matching and its FPGA Implementation , 2017, Cognitive Computation.

[96]  Lorenzo Livi,et al.  Optimized dissimilarity space embedding for labeled graphs , 2014, Inf. Sci..

[97]  Lorenzo Livi,et al.  Graph ambiguity , 2013, Fuzzy Sets Syst..

[98]  Antonello Rizzi,et al.  Supervised Approaches for Protein Function Prediction by Topological Data Analysis , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[99]  Alessandro Giuliani,et al.  (Hyper)Graph Embedding and Classification via Simplicial Complexes , 2019, Algorithms.