Deriving Convergent and Divergent Metabolomic Correlates of Pulmonary Arterial Hypertension

High-dimensional metabolomics analyses may identify convergent and divergent markers, potentially representing aligned or orthogonal disease pathways that underly conditions such as pulmonary arterial hypertension (PAH). Using a comprehensive PAH metabolomics dataset, we applied six different conventional and statistical learning techniques to identify analytes associated with key outcomes and compared the results. We found that certain conventional techniques, such as Bonferroni/FDR correction, prioritized metabolites that tended to be highly intercorrelated. Statistical learning techniques generally agreed with conventional techniques on the top-ranked metabolites, but were also more inclusive of different metabolite groups. In particular, conventional methods prioritized sterol and oxylipin metabolites in relation to idiopathic versus non-idiopathic PAH, whereas statistical learning methods tended to prioritize eicosanoid, bile acid, fatty acid, and fatty acyl ester metabolites. Our findings demonstrate how conventional and statistical learning techniques can offer both concordant or discordant results. In the case of a rare yet morbid condition, such as PAH, convergent metabolites may reflect common pathways to shared disease outcomes whereas divergent metabolites could signal either distinct etiologic mechanisms, different sub-phenotypes, or varying stages of disease progression. Notwithstanding the need to investigate the mechanisms underlying the observed results, our main findings suggest that a multi-method approach to statistical analyses of high-dimensional human metabolomics datasets could effectively broaden the scientific yield from a given study design.

[1]  Mohit M. Jain,et al.  Sex-related Differences in Eicosanoid Levels in Chronic Thromboembolic Pulmonary Hypertension. , 2023, American journal of respiratory cell and molecular biology.

[2]  Sina A. Gharib,et al.  Metabolomic Signatures Associated With Pulmonary Arterial Hypertension Outcomes , 2023, Circulation research.

[3]  A. Malhotra,et al.  Metabolomic Profiles Differentiate Scleroderma-PAH from Idiopathic PAH and Correspond with Worsened Functional Capacity. , 2022, Chest.

[4]  S. Duval,et al.  Mortality in Pulmonary Arterial Hypertension in the Modern Era: Early Insights From the Pulmonary Hypertension Association Registry , 2022, Journal of the American Heart Association.

[5]  Zhengyan Huang,et al.  A Review on Differential Abundance Analysis Methods for Mass Spectrometry-Based Metabolomic Data , 2022, Metabolites.

[6]  Fei Li,et al.  Metabolomics reveals metabolite changes of patients with pulmonary arterial hypertension in China , 2020, Journal of cellular and molecular medicine.

[7]  Yuchi Han,et al.  Metabolomics of exercise pulmonary hypertension are intermediate between controls and patients with pulmonary arterial hypertension , 2019, Pulmonary circulation.

[8]  Olga V. Demler,et al.  Statistical Workflow for Feature Selection in Human Metabolomics Data , 2019, Metabolites.

[9]  R. Gerszten,et al.  Human PAH is characterized by a pattern of lipid-related insulin resistance. , 2019, JCI insight.

[10]  Z. Jing,et al.  Risk stratification and medical therapy of pulmonary arterial hypertension , 2019, European Respiratory Journal.

[11]  M. Humbert,et al.  Pathology and pathobiology of pulmonary hypertension: state of the art and research perspectives , 2019, European Respiratory Journal.

[12]  Mohit M. Jain,et al.  High-Throughput Measure of Bioactive Lipids Using Non-targeted Mass Spectrometry. , 2018, Methods in molecular biology.

[13]  Mir Henglin,et al.  Directed Non-Targeted Mass Spectrometry and Chemical Networking for Discovery of Eicosanoids , 2018, 1806.01467.

[14]  Olga V. Demler,et al.  Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data , 2017, Metabolites.

[15]  P. Corris,et al.  Plasma Metabolomics Implicates Modified Transfer RNAs and Altered Bioenergetics in the Outcomes of Pulmonary Arterial Hypertension , 2017, Circulation.

[16]  Yaohui Zeng,et al.  The biglasso Package: A Memory- and Computation-Efficient Solver for Lasso Model Fitting with Big Data in R , 2017, R J..

[17]  Marco Masseroli,et al.  Analysis of metabolomic data: tools, current strategies and future challenges for omics data integration , 2016, Briefings Bioinform..

[18]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[19]  Licun Wu,et al.  Metabolomic Heterogeneity of Pulmonary Arterial Hypertension , 2014, PloS one.

[20]  V. Mootha,et al.  Metabolite profiles and the risk of developing diabetes , 2011, Nature Medicine.

[21]  Xi Chen,et al.  Random survival forests for high‐dimensional data , 2011, Stat. Anal. Data Min..

[22]  Udaya B. Kogalur,et al.  High-Dimensional Variable Selection for Survival Data , 2010 .

[23]  K. Strimmer,et al.  Feature selection in omics prediction problems using cat scores and false nondiscovery rate control , 2009, 0903.2003.

[24]  Korbinian Strimmer,et al.  Gene ranking and biomarker discovery under correlation , 2009, Bioinform..

[25]  H. Ishwaran Variable importance in binary regression trees and forests , 2007, 0711.2434.

[26]  Korbinian Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology , 2005 .

[27]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[28]  L. Breiman Random Forests , 2001, Encyclopedia of Machine Learning and Data Mining.

[29]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[30]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[31]  D. Altman,et al.  Multiple significance tests: the Bonferroni method. , 1995, BMJ.

[32]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[33]  Teoria Statistica Delle Classi e Calcolo Delle Probabilità , 2022, The SAGE Encyclopedia of Research Design.

[34]  Bowei Xi,et al.  Statistical analysis and modeling of mass spectrometry-based metabolomics data. , 2014, Methods in molecular biology.

[35]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[36]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .