PAIRUP-MS: Pathway analysis and imputation to relate unknowns in profiles from mass spectrometry-based metabolite data

Metabolomics is a powerful approach for discovering biomarkers and for characterizing the biochemical consequences of genetic variation. While untargeted metabolite profiling can measure thousands of signals in a single experiment, many biologically meaningful signals cannot be readily identified as known metabolites nor compared across datasets, making it difficult to infer biology and to conduct well-powered meta-analyses across studies. To overcome these challenges, we developed a suite of computational methods, PAIRUP-MS, to match metabolite signals across mass spectrometry-based profiling datasets and to generate metabolic pathway annotations for these signals. To pair up signals measured in different datasets, where retention times (RT) are often not comparable or even available, we implemented an imputation-based approach that only requires mass-to-charge ratios (m/z). As validation, we treated each shared known metabolite as an unmatched signal and showed that PAIRUP-MS correctly matched 70–88% of these metabolites from among thousands of signals, equaling or outperforming a standard m/z- and RT-based approach. We performed further validation using genetic data: the most stringent set of matched signals and shared knowns showed comparable consistency of genetic associations across datasets. Next, we developed a pathway reconstitution method to annotate unknown signals using curated metabolic pathways containing known metabolites. We performed genetic validation for the generated annotations, showing that annotated signals associated with gene variants were more likely to be enriched for pathways functionally related to the genes compared to random expectation. Finally, we applied PAIRUP-MS to study associations between metabolites and genetic variants or body mass index (BMI) across multiple datasets, identifying up to ~6 times more significant signals and many more BMI-associated pathways compared to the standard practice of only analyzing known metabolites. These results demonstrate that PAIRUP-MS enables analysis of unknown signals in a robust, biologically meaningful manner and provides a path to more comprehensive, well-powered studies of untargeted metabolomics data.

[1]  David S. Wishart,et al.  HMDB 4.0: the human metabolome database for 2018 , 2017, Nucleic Acids Res..

[2]  Yun Li,et al.  METAL: fast and efficient meta-analysis of genomewide association scans , 2010, Bioinform..

[3]  A. Gámez,et al.  Phenylalanine ammonia lyase, enzyme substitution therapy for phenylketonuria, where are we now? , 2005, Molecular genetics and metabolism.

[4]  Arjen Lommen,et al.  MetAlign: interface-driven, versatile metabolomics tool for hyphenated full-scan mass spectrometry data preprocessing. , 2009, Analytical chemistry.

[5]  G. Siuzdak,et al.  Innovation: Metabolomics: the apogee of the omics trilogy , 2012, Nature Reviews Molecular Cell Biology.

[6]  Ming-Huei Chen,et al.  A genome-wide association study of the human metabolome in a community-based cohort. , 2013, Cell metabolism.

[7]  Markus Perola,et al.  Genome-wide association study identifies multiple loci influencing human serum metabolite levels , 2012, Nature Genetics.

[8]  A. Ballabio,et al.  Molecular and biochemical characterisation of a novel sulphatase gene: Arylsulfatase G (ARSG) , 2002, European Journal of Human Genetics.

[9]  Shuzhao Li,et al.  Predicting Network Activity from High Throughput Metabolomics , 2013, PLoS Comput. Biol..

[10]  S. Gersting,et al.  Loss of function in phenylketonuria is caused by impaired molecular motions and conformational instability. , 2008, American journal of human genetics.

[11]  Toomas Haller,et al.  Biomarker Profiling by Nuclear Magnetic Resonance Spectroscopy for the Prediction of All-Cause Mortality: An Observational Study of 17,345 Persons , 2014, PLoS medicine.

[12]  John P. Overington,et al.  An atlas of genetic influences on human blood metabolites , 2014, Nature Genetics.

[13]  R. Mägi,et al.  Cohort Profile Cohort Profile : Estonian Biobank of the Estonian Genome Center , University of Tartu , 2015 .

[14]  Susan Cheng,et al.  Metabolite Profiling Identifies Pathways Associated With Metabolic Risk in Humans , 2012, Circulation.

[15]  Christian Gieger,et al.  A genome-wide perspective of genetic variation in human metabolism , 2010, Nature Genetics.

[16]  V. Mootha,et al.  Metabolite profiles and the risk of developing diabetes , 2011, Nature Network Boston.

[17]  Thaer Barri,et al.  UPLC-ESI-QTOF/MS and multivariate data analysis for blood plasma and serum metabolomics: effect of experimental artefacts and anticoagulant. , 2013, Analytica chimica acta.

[18]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[19]  Christian Gieger,et al.  Genetic variation in metabolic phenotypes: study designs and applications , 2012, Nature Reviews Genetics.

[20]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[21]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[22]  Ernest Fraenkel,et al.  Revealing disease-associated pathways by network integration of untargeted metabolomics , 2016, Nature Methods.

[23]  Tanya M. Teslovich,et al.  Sequence variants in SLC16A11 are a common risk factor for type 2 diabetes in Mexico , 2013, Nature.

[24]  Christian Gieger,et al.  Mining the Unknown: A Systems Approach to Metabolite Identification Combining Genetic and Metabolic Information , 2012, PLoS genetics.

[25]  A. Ohara,et al.  Hydroxylation of phenylalanine by the hypoxanthine-xanthine oxidase system. , 1984, Chemical & pharmaceutical bulletin.

[26]  J. Hirschhorn,et al.  Biological interpretation of genome-wide association studies using predicted gene functions , 2015, Nature Communications.

[27]  Ralf Herwig,et al.  The ConsensusPathDB interaction database: 2013 update , 2012, Nucleic Acids Res..

[28]  Ralf Tautenhahn,et al.  Meta-analysis of untargeted metabolomic data from multiple profiling experiments , 2012, Nature Protocols.

[29]  E. S. Baekkevold,et al.  Molecular cloning and functional analysis of SUT-1, a sulfate transporter from human high endothelial venules. , 1999, Proceedings of the National Academy of Sciences of the United States of America.