Computational platform for doctor–artificial intelligence cooperation in pulmonary arterial hypertension prognostication: a pilot study

Background Pulmonary arterial hypertension (PAH) is a heterogeneous and complex pulmonary vascular disease associated with substantial morbidity. Machine-learning algorithms (used in many PAH risk calculators) can combine established parameters with thousands of circulating biomarkers to optimise PAH prognostication, but these approaches do not offer the clinician insight into what parameters drove the prognosis. The approach proposed in this study diverges from other contemporary phenotyping methods by identifying patient-specific parameters driving clinical risk. Methods We trained a random forest algorithm to predict 4-year survival risk in a cohort of 167 adult PAH patients evaluated at Stanford University, with 20% withheld for (internal) validation. Another cohort of 38 patients from Sheffield University were used as a secondary (external) validation. Shapley values, borrowed from game theory, were computed to rank the input parameters based on their importance to the predicted risk score for the entire trained random forest model (global importance) and for an individual patient (local importance). Results Between the internal and external validation cohorts, the random forest model predicted 4-year risk of death/transplant with sensitivity and specificity of 71.0–100% and 81.0–89.0%, respectively. The model reinforced the importance of established prognostic markers, but also identified novel inflammatory biomarkers that predict risk in some PAH patients. Conclusion These results stress the need for advancing individualised phenotyping strategies that integrate clinical and biochemical data with outcome. The computational platform presented in this study offers a critical step towards personalised medicine in which a clinician can interpret an algorithm's assessment of an individual patient. High-throughput biomarker screening and machine learning (ML) are promising new technologies that could revolutionise the way doctors screen PAH patients. Principles of game theory combined with ML modelling would allow doctor–ML collaboration. https://bit.ly/3FvbXJD

[1]  M. Humbert,et al.  Using the Plasma Proteome for Risk Stratifying Patients with Pulmonary Arterial Hypertension. , 2022, American journal of respiratory and critical care medicine.

[2]  Dennis Wang,et al.  A diagnostic miRNA signature for pulmonary arterial hypertension using a consensus machine learning approach , 2021, EBioMedicine.

[3]  Sanjiv J. Shah,et al.  STATE OF THE ART Pulmonary Arterial Hypertension: Diagnosis, Treatment, and Novel Advances , 2021 .

[4]  M. Humbert,et al.  Integrating haemodynamics identifies an extreme pulmonary hypertension phenotype , 2021, European Respiratory Journal.

[5]  M. Humbert,et al.  Sotatercept for the Treatment of Pulmonary Arterial Hypertension. , 2021, The New England journal of medicine.

[6]  M. Wilkins Personalized Medicine for Pulmonary Hypertension:: The Future Management of Pulmonary Hypertension Requires a New Taxonomy. , 2021, Clinics in chest medicine.

[7]  O. Distler,et al.  Identifying early pulmonary arterial hypertension biomarkers in systemic sclerosis: machine learning on proteomics from the DETECT cohort , 2020, European Respiratory Journal.

[8]  S. Mandras,et al.  Pulmonary Hypertension: A Brief Guide for Clinicians. , 2020, Mayo Clinic proceedings.

[9]  R. Benza,et al.  Development and Validation of an Abridged Version of the REVEAL 2.0 Risk Score Calculator, REVEAL Lite 2, for Use in Patients With Pulmonary Arterial Hypertension , 2020, Chest.

[10]  Marek J Druzdzel,et al.  Risk stratification in pulmonary arterial hypertension using Bayesian analysis , 2020, European Respiratory Journal.

[11]  Hugh Chen,et al.  From local explanations to global understanding with explainable AI for trees , 2020, Nature Machine Intelligence.

[12]  Maarten van Smeden,et al.  Calibration: the Achilles heel of predictive analytics , 2019, BMC Medicine.

[13]  R. Benza,et al.  Predicting Survival in Patients With Pulmonary Arterial Hypertension: The REVEAL Risk Score Calculator 2.0 and Comparison With ESC/ERS-Based Risk Assessment Strategies. , 2019, Chest.

[14]  S. Rosenkranz,et al.  Key inflammatory pathways underlying vascular remodeling in pulmonary hypertension , 2019, Herz.

[15]  P. Khatri,et al.  Discovery of Distinct Immune Phenotypes Using Machine Learning in Pulmonary Arterial Hypertension , 2019, Circulation research.

[16]  R. Singla,et al.  Correlation of a modified shuttle walk with six-minute walk test in COPD patients , 2018, Clinical respiratory physiology, exercise and functional imaging.

[17]  K. Malecki,et al.  Using recursive feature elimination in random forest to account for correlated variables in high dimensional data , 2018, BMC Genetics.

[18]  M. Humbert,et al.  Plasma proteome analysis in patients with pulmonary arterial hypertension: an observational cohort study , 2017, The Lancet. Respiratory medicine.

[19]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[20]  Daniel D. Drevon,et al.  Intercoder Reliability and Validity of WebPlotDigitizer in Extracting Graphed Data , 2017, Behavior modification.

[21]  B. Michel,et al.  Correlation and variable importance in random forests , 2016, Statistics and Computing.

[22]  M. Humbert,et al.  Inflammation and immunity in the pathogenesis of pulmonary arterial hypertension. , 2014, Circulation research.

[23]  R. Speich,et al.  Inflammatory cytokines in pulmonary hypertension , 2014, Respiratory Research.

[24]  M. Humbert,et al.  Proinflammatory cytokine levels are linked to death in pulmonary arterial hypertension , 2013, European Respiratory Journal.

[25]  R. Dweik,et al.  Plasma interleukin-6 adds prognostic information in pulmonary arterial hypertension , 2013, European Respiratory Journal.

[26]  H. Turner,et al.  Comparison of sample types for N-terminal pro-B-type natriuretic peptide measured on the Siemens Immulite 2500 and Dimension Vista LOCI methods , 2012, Annals of clinical biochemistry.

[27]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[28]  M. Raza Elevated Levels of Inflammatory Cytokines Predict Survival in Idiopathic and Familial Pulmonary Arterial Hypertension , 2011 .

[29]  R. Trembath,et al.  Elevated Levels of Inflammatory Cytokines Predict Survival in Idiopathic and Familial Pulmonary Arterial Hypertension , 2010, Circulation.

[30]  G. Lippi,et al.  Measurement of Elecsys NT-proBNP in serum, K2 EDTA and heparin plasma. , 2007, Clinical biochemistry.

[31]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[32]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[33]  Jun Chen,et al.  Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes , 2004, BMC Bioinformatics.

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  Lloyd S. Shapley,et al.  Notes on the n-Person Game — II: The Value of an n-Person Game , 1951 .