Cheminformatics approach to exploring and modeling trait-associated metabolite profiles

Developing predictive and transparent approaches to the analysis of metabolite profiles across patient cohorts is of critical importance for understanding the events that trigger or modulate traits of interest (e.g., disease progression, drug metabolism, chemical risk assessment). However, metabolites’ chemical structures are still rarely used in the statistical modeling workflows that establish these trait-metabolite relationships. Herein, we present a novel cheminformatics-based approach capable of identifying predictive, interpretable, and reproducible trait-metabolite relationships. As a proof-of-concept, we utilize a previously published case study consisting of metabolite profiles from non-small-cell lung cancer (NSCLC) adenocarcinoma patients and healthy controls. By characterizing each structurally annotated metabolite using both computed molecular descriptors and patient metabolite concentration profiles, we show that these complementary features enhance the identification and understanding of key metabolites associated with cancer. Ultimately, we built multi-metabolite classification models for assessing patients’ cancer status using specific groups of metabolites identified based on high structural similarity through chemical clustering. We subsequently performed a metabolic pathway enrichment analysis to identify potential mechanistic relationships between metabolites and NSCLC adenocarcinoma. This cheminformatics-inspired approach relies on the metabolites’ structural features and chemical properties to provide critical information about metabolite-trait associations. This method could ultimately facilitate biological understanding and advance research based on metabolomics data, especially with respect to the identification of novel biomarkers.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  M. Milburn,et al.  Metabolomics as a Key Integrator for “Omic” Advancement of Personalized Medicine and Future Therapies , 2012, Clinical and translational science.

[3]  Clary B. Clish,et al.  Metabolomics: an emerging but powerful tool for precision medicine , 2015, Cold Spring Harbor molecular case studies.

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Florian Kronenberg,et al.  Differences between Human Plasma and Serum Metabolite Profiles , 2011, PloS one.

[6]  Jürgen Kurths,et al.  Observing and Interpreting Correlations in Metabolic Networks , 2003, Bioinform..

[7]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[8]  Alexander Tropsha,et al.  Trust, but Verify II: A Practical Guide to Chemogenomics Data Curation , 2016, J. Chem. Inf. Model..

[9]  Michael W. Weiner,et al.  Metabolic network failures in Alzheimer's disease: A biochemical road map , 2017, Alzheimer's & Dementia.

[10]  Denis Fourches,et al.  Cheminformatics: At the Crossroad of Eras , 2014 .

[11]  Anne-Laure Boulesteix,et al.  Regularized estimation of large-scale gene association networks using graphical Gaussian models , 2009, BMC Bioinformatics.

[12]  M. Dwass Modified Randomization Tests for Nonparametric Hypotheses , 1957 .

[13]  Liang Li,et al.  Sample normalization methods in quantitative metabolomics. , 2016, Journal of chromatography. A.

[14]  Christoph Steinbeck,et al.  Global open data management in metabolomics , 2017, Current opinion in chemical biology.

[15]  Yasuo Tabei,et al.  Metabolome-scale de novo pathway reconstruction using regioisomer-sensitive graph alignments , 2015, Bioinform..

[16]  Caroline H. Johnson,et al.  Metabolomics: beyond biomarkers and towards mechanisms , 2016, Nature Reviews Molecular Cell Biology.

[17]  Oliver Fiehn,et al.  Systemic alterations in the metabolome of diabetic NOD mice delineate increased oxidative stress accompanied by reduced inflammation and hypertriglyceremia. , 2015, American journal of physiology. Endocrinology and metabolism.

[18]  Joe Wandy,et al.  Topic modeling for untargeted substructure exploration in metabolomics , 2016, Proceedings of the National Academy of Sciences.

[19]  Neil Swainston,et al.  A ‘rule of 0.5’ for the metabolite-likeness of approved pharmaceutical drugs , 2014, Metabolomics.

[20]  David S. Wishart,et al.  Bioinformatics Applications Note Systems Biology Metpa: a Web-based Metabolomics Tool for Pathway Analysis and Visualization , 2022 .

[21]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[22]  Jianguo Xia,et al.  Using MetaboAnalyst 3.0 for Comprehensive Metabolomics Data Analysis , 2016, Current protocols in bioinformatics.

[23]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[24]  Jeremy J. Ramsden,et al.  Metabolomics and metabonomics , 2015 .

[25]  Maurizio Vichi,et al.  Studies in Classification Data Analysis and knowledge Organization , 2011 .

[26]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[27]  Taofeek K Owonikoko,et al.  Altered glutamine metabolism and therapeutic opportunities for lung cancer. , 2014, Clinical lung cancer.

[28]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences of the United States of America.

[29]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[30]  Eoin Fahy,et al.  Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools , 2015, Nucleic Acids Res..

[31]  Ilhem Diboun,et al.  Metabolic signatures differentiate ovarian from colon cancer cell lines , 2015, Journal of Translational Medicine.

[32]  J. Dearden,et al.  QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[33]  P. Mendes,et al.  The origin of correlations in metabolomics data , 2005, Metabolomics.

[34]  Alexander Tropsha,et al.  Curation of chemogenomics data. , 2015, Nature chemical biology.

[35]  Oliver Fiehn,et al.  Investigation of Metabolomic Blood Biomarkers for Detection of Adenocarcinoma Lung Cancer , 2015, Cancer Epidemiology, Biomarkers & Prevention.

[36]  Ke Lan,et al.  An integrated metabolomics and pharmacokinetics strategy for multi-component drugs evaluation. , 2010, Current drug metabolism.

[37]  Oliver Fiehn,et al.  Chemical Similarity Enrichment Analysis (ChemRICH) as alternative to biochemical pathway mapping for metabolomic datasets , 2017, Scientific Reports.

[38]  T. Ebbels,et al.  Optimized preprocessing of ultra-performance liquid chromatography/mass spectrometry urinary metabolic profiles for improved information recovery. , 2011, Analytical chemistry.

[39]  Kwanjeera Wanichthanarak,et al.  MetaMapR: pathway independent metabolomic network analysis incorporating unknowns , 2015, Bioinform..

[40]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[41]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[42]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching , 2017, Journal of Cheminformatics.

[43]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[44]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[45]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[46]  Alexander Tropsha,et al.  Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research , 2010, J. Chem. Inf. Model..

[47]  Oliver Fiehn,et al.  MetaMapp: mapping and visualizing metabolomic data by integrating information from biochemical pathways and chemical and mass spectral similarity , 2012, BMC Bioinformatics.

[48]  Daniel M. Rotroff,et al.  Pharmacometabolomic Assessments of Atenolol and Hydrochlorothiazide Treatment Reveal Novel Drug Response Phenotypes , 2015, CPT: pharmacometrics & systems pharmacology.

[49]  Pavan Bhargava,et al.  Metabolomics in multiple sclerosis , 2016, Multiple sclerosis.

[50]  Yue Huang,et al.  Metabolomics: a novel approach to identify potential diagnostic biomarkers and pathogenesis in Alzheimer’s disease , 2012, Neuroscience Bulletin.

[51]  Leonid Gorb,et al.  Application of Computational Techniques in Pharmacy and Medicine , 2014 .

[52]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[53]  J. van Helden,et al.  Metabolic pathfinding using RPAIR annotation. , 2009, Journal of molecular biology.

[54]  Emily L. Kang,et al.  Computational and statistical analysis of metabolomics data , 2015, Metabolomics.

[55]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[56]  Jian Kang,et al.  Metabolomics-based promising candidate biomarkers and pathways in Alzheimer's disease. , 2015, Die Pharmazie.

[57]  Marcin Koba,et al.  Amino acid profiling as a method of discovering biomarkers for early diagnosis of cancer , 2016, Amino Acids.

[58]  R. Weinshilboum,et al.  Metabolomics: a global biochemical approach to drug response and disease. , 2008, Annual review of pharmacology and toxicology.

[59]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[60]  David S. Wishart,et al.  MetaboAnalyst 3.0—making metabolomics more meaningful , 2015, Nucleic Acids Res..

[61]  Douglas B. Kell,et al.  The metabolome 18 years on: a concept comes of age , 2016, Metabolomics.

[62]  J. W. Allwood,et al.  Is serum or plasma more appropriate for intersubject comparisons in metabolomic studies? An assessment in patients with small-cell lung cancer. , 2011, Analytical chemistry.

[63]  Oliver Fiehn,et al.  Metabolomic Markers of Altered Nucleotide Metabolism in Early Stage Adenocarcinoma , 2015, Cancer Prevention Research.

[64]  Susumu Goto,et al.  PathPred: an enzyme-catalyzed metabolic pathway prediction server , 2010, Nucleic Acids Res..

[65]  David K. Smith,et al.  ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data , 2017 .

[66]  Daniel M. Rotroff,et al.  Metabolomic signatures of drug response phenotypes for ketamine and esketamine in subjects with refractory major depressive disorder: new mechanistic insights for rapid acting antidepressants , 2016, Translational psychiatry.

[67]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[68]  Linda J. Broadbelt,et al.  Efficient searching and annotation of metabolic networks using chemical similarity , 2015, Bioinform..

[69]  Tao Huan,et al.  Data processing, multi-omic pathway mapping, and metabolite activity analysis using XCMS Online , 2018, Nature Protocols.

[70]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[71]  Alexander Golbraikh,et al.  Predictive QSAR modeling workflow, model applicability domains, and virtual screening. , 2007, Current pharmaceutical design.

[72]  Ian Fellows,et al.  Deducer: A Data Analysis GUI for R , 2012 .

[73]  Olivier Bodenreider,et al.  Data Integration in the Life Sciences , 2012, Lecture Notes in Computer Science.

[74]  Alexander Raskind,et al.  Statistical methods in metabolomics. , 2012, Methods in molecular biology.

[75]  Ying Huang,et al.  Cystine-glutamate transporter SLC7A11 in cancer chemosensitivity and chemoresistance. , 2005, Cancer research.

[76]  A. Zeileis Econometric Computing with HC and HAC Covariance Matrix Estimators , 2004 .

[77]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[78]  Matej Oresic,et al.  Normalization method for metabolomics data using optimal selection of multiple internal standards , 2007, BMC Bioinformatics.

[79]  Yasuo Tabei,et al.  Supervised de novo reconstruction of metabolic pathways from metabolome-scale compound sets , 2013, Bioinform..

[80]  Martin Scholz,et al.  Setup and Annotation of Metabolomic Experiments by Integrating Biological and Mass Spectrometric Metadata , 2005, DILS.

[81]  Kwanjeera Wanichthanarak,et al.  Altered Nicotinamide-and Polyamine Pathways in Lung Adenocarcinoma , 2016 .