A deep learning architecture for metabolic pathway prediction

MOTIVATION Understanding the mechanisms and structural mappings between molecules and pathway classes is critical for design of reaction predictors for synthesizing new molecules. This paper studies the problem of prediction of classes of metabolic pathways (series of chemical reactions occurring within a cell) in which a given biochemical compound participates. We apply a hybrid machine learning approach consisting of graph convolutional networks used to extract molecular shape features as input to a random forest classifier. In contrast to previously applied machine learning methods for this problem, our framework automatically extracts relevant shape features directly from input SMILES representations, which are atom-bond specifications of chemical structures composing the molecules. RESULTS Our method is capable of correctly predicting the respective metabolic pathway class of 95.16% of tested compounds, whereas competing methods only achieve an accuracy of 84.92% or less. Furthermore, our framework extends to the task of classification of compounds having mixed membership in multiple pathway classes. Our prediction accuracy for this multi-label task is 97.61%. We analyze the relative importance of various global physicochemical features to the pathway class prediction problem and show that simple linear/logistic regression models can predict the values of these global features from the shape features extracted using our framework. AVAILABILITY https://github.com/baranwa2/MetabolicPathwayPrediction Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Min Yang,et al.  A graph convolutional neural network for classification of building patterns using spatial vector data , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[2]  Lin Lu,et al.  Prediction of compounds’ biological function (metabolic pathways) based on functional group composition , 2008, Molecular Diversity.

[3]  D. Covell A data mining approach for identifying pathway-gene biomarkers for predicting clinical outcome: A case study of erlotinib and sorafenib , 2017, PloS one.

[4]  S. Hochreiter,et al.  DeepTox: Toxicity prediction using deep learning , 2017 .

[5]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[6]  Peter D. Karp,et al.  Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology , 2015, Briefings Bioinform..

[7]  Connor W. Coley,et al.  A graph-convolutional neural network model for the prediction of chemical reactivity , 2018, Chemical science.

[8]  D. Lockhart,et al.  Functional Genomics , 1999, Springer Netherlands.

[9]  Lihua Li,et al.  DEEPre: sequence-based enzyme EC number prediction by deep learning , 2017, Bioinform..

[10]  Xin Gao,et al.  MRE: a web tool to suggest foreign enzymes for the biosynthesis pathway design with competing endogenous reactions in mind , 2016, Nucleic Acids Res..

[11]  Christoph B. Messner,et al.  Machine Learning Predicts the Yeast Metabolome from the Quantitative Proteome of Kinase Knockouts , 2018, Cell systems.

[12]  Jonathan D. Hirst,et al.  TMACC: Interpretable Correlation Descriptors for Quantitative Structure-Activity Relationships , 2007, J. Chem. Inf. Model..

[13]  Jun Sese,et al.  Compound‐protein interaction prediction with end‐to‐end learning of neural networks for graphs and sequences , 2018, Bioinform..

[14]  Lynda B. M. Ellis,et al.  The University of Minnesota pathway prediction system: predicting metabolic logic , 2008, Nucleic Acids Res..

[15]  J. Lindon,et al.  Metabonomics: a platform for studying drug toxicity and gene function , 2002, Nature Reviews Drug Discovery.

[16]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[17]  Nikhil Ketkar,et al.  Deep Learning with Python , 2017 .

[18]  Long Zhang,et al.  Protein-protein interactions prediction based on ensemble deep neural networks , 2019, Neurocomputing.

[19]  Lei Chen,et al.  A Network Integration Method for Deciphering the Types of Metabolic Pathway of Chemicals with Heterogeneous Information. , 2018, Combinatorial chemistry & high throughput screening.

[20]  Shin-Han Shiu,et al.  Robust predictions of specialized metabolism genes through machine learning , 2018, Proceedings of the National Academy of Sciences.

[21]  M. Maccoss,et al.  Importance of Rigidity in Designing Small Molecule Drugs To Tackle Protein-Protein Interactions (PPIs) through Stabilization of Desired Conformers. , 2017, Journal of medicinal chemistry.

[22]  David I. Ellis,et al.  Metabolomics: Current analytical platforms and methodologies , 2005 .

[23]  Stephen R. Johnson,et al.  Molecular properties that influence the oral bioavailability of drug candidates. , 2002, Journal of medicinal chemistry.

[24]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[25]  Susumu Goto,et al.  PathPred: an enzyme-catalyzed metabolic pathway prediction server , 2010, Nucleic Acids Res..

[26]  Igor Labutov Machine Learning Methods For Machine Teaching , 2016 .

[27]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[28]  Lin Wang,et al.  A review of computational tools for design and reconstruction of metabolic pathways , 2017, Synthetic and systems biotechnology.

[29]  Xin Gao,et al.  Systematic selection of chemical fingerprint features improves the Gibbs energy prediction of biochemical reactions , 2018, Bioinform..

[30]  Saurabh Pal,et al.  Classification of Skin Disease using Ensemble Data Mining Techniques , 2019, Asian Pacific journal of cancer prevention : APJCP.

[31]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[32]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[33]  Chen Chu,et al.  Predicting the types of metabolic pathway of compounds using molecular fragments and sequential minimal optimization. , 2016, Combinatorial chemistry & high throughput screening.

[34]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[35]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[36]  Sanguthevar Rajasekaran,et al.  Metabolic Pathway Predictions for Metabolomics: A Molecular Structure Matching Approach , 2015, J. Chem. Inf. Model..

[37]  Julio Saez-Rodriguez,et al.  Machine Learning Prediction of Cancer Cell Sensitivity to Drugs Based on Genomic and Chemical Properties , 2012, PloS one.

[38]  R. Chaguturu Combinatorial Chemistry & High Throughput Screening. Editorial. , 2013, Combinatorial chemistry & high throughput screening.

[39]  A. Ghose,et al.  A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. , 1999, Journal of combinatorial chemistry.

[40]  G. Crippen,et al.  Prediction of Physicochemical Parameters by Atomic Contributions. , 1999 .

[41]  Steven C. H. Hoi,et al.  Multi-target deep neural networks: Theoretical analysis and implementation , 2018, Neurocomputing.

[42]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[43]  Satoru Kuhara,et al.  An integrated database SPAD (Signaling PAthway Database) for signal transduction and genetic information , 1995 .

[44]  J. Millis,et al.  THE UNIVERSITY OF , 2000 .

[45]  Tudor I. Oprea,et al.  Property distribution of drug-related chemical databases* , 2000, J. Comput. Aided Mol. Des..

[46]  Peter D. Karp,et al.  Machine learning methods for metabolic pathway prediction , 2010 .

[47]  O. Fiehn Metabolomics – the link between genotypes and phenotypes , 2004, Plant Molecular Biology.

[48]  Lei Chen,et al.  A Binary Classifier for Prediction of the Types of Metabolic Pathway of Chemicals. , 2017, Combinatorial chemistry & high throughput screening.

[49]  T. Ritchie,et al.  The impact of aromatic ring count on compound developability--are too many aromatic rings a liability in drug design? , 2009, Drug discovery today.

[50]  Sunwon Park,et al.  Prediction of novel synthetic pathways for the production of desired chemicals , 2010, BMC Systems Biology.

[51]  Karthik Raman,et al.  Predicting Novel Metabolic Pathways through Subgraph Mining , 2017, bioRxiv.

[52]  Tsuguchika Kaminuma,et al.  A Database for Cell Signaling Networks , 1998, J. Comput. Biol..

[53]  Peter D. Karp,et al.  The EcoCyc and MetaCyc databases , 2000, Nucleic Acids Res..

[54]  P. A. Steadman,et al.  PathDB : a second generation metabolic database , 2000 .

[55]  Abhinav Vishnu,et al.  Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models , 2017, ArXiv.

[56]  Duane Szafron,et al.  The Path-A metabolic pathway prediction web server , 2006, Nucleic Acids Res..

[57]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[58]  Ron D. Appel,et al.  ExPASy: the proteomics server for in-depth protein knowledge and analysis , 2003, Nucleic Acids Res..

[59]  Morteza Saheb Zamani,et al.  FogLight: an efficient matrix-based approach to construct metabolic pathways by search space reduction , 2016, Bioinform..

[60]  Nikhil Ketkar,et al.  Introduction to PyTorch , 2021, Deep Learning with Python.

[61]  Robert Hoehndorf,et al.  Prediction of Metabolic Pathway Involvement in Prokaryotic UniProtKB Data by Association Rule Mining , 2016, PloS one.

[62]  Janet M. Thornton,et al.  Mapping Human Metabolic Pathways in the Small Molecule Chemical Space , 2009, J. Chem. Inf. Model..

[63]  Lei Chen,et al.  Predicting Metabolic Pathways of Small Molecules and Enzymes Based on Interaction Information of Chemicals and Proteins , 2012, PloS one.

[64]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[65]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[66]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. , 2001, Advanced drug delivery reviews.

[67]  Edward W. Lowe,et al.  Computational Methods in Drug Discovery , 2014, Pharmacological Reviews.

[68]  Kuo-Chen Chou,et al.  Predicting Biological Functions of Compounds Based on Chemical-Chemical Interactions , 2011, PloS one.