Deep learning on graphs for multi-omics classification of COPD

Network approaches have successfully been used to help reveal complex mechanisms of diseases including Chronic Obstructive Pulmonary Disease (COPD). However despite recent advances, we remain limited in our ability to incorporate protein-protein interaction (PPI) network information with omics data for disease prediction. New deep learning methods including convolution Graph Neural Network (ConvGNN) has shown great potential for disease classification using transcriptomics data and known PPI networks from existing databases. In this study, we first reconstructed the COPD-associated PPI network through the AhGlasso (Augmented High-Dimensional Graphical Lasso Method) algorithm based on one independent transcriptomics dataset including COPD cases and controls. Then we extended the existing ConvGNN methods to successfully integrate COPD-associated PPI, proteomics, and transcriptomics data and developed a prediction model for COPD classification. This approach improves accuracy over several conventional classification methods and neural networks that do not incorporate network information. We also demonstrated that the updated COPD-associated network developed using AhGlasso further improves prediction accuracy. Although deep neural networks often achieve superior statistical power in classification compared to other methods, it can be very difficult to explain how the model, especially graph neural network(s), makes decisions on the given features and identifies the features that contribute the most to prediction generally and individually. To better explain how the spectral-based Graph Neural Network model(s) works, we applied one unified explainable machine learning method, SHapley Additive exPlanations (SHAP), and identified CXCL11, IL-2, CD48, KIR3DL2, TLR2, BMP10 and several other relevant COPD genes in subnetworks of the ConvGNN model for COPD prediction. Finally, Gene Ontology (GO) enrichment analysis identified glycosaminoglycan, heparin signaling, and carbohydrate derivative signaling pathways significantly enriched in the top important gene/proteins for COPD classifications.

[1]  Kewu Huang,et al.  Early detection of COPD based on graph convolutional network and small and weakly labeled data , 2022, Medical & Biological Engineering & Computing.

[2]  R. Bowler,et al.  An Augmented High-Dimensional Graphical Lasso Method to Incorporate Prior Biological Knowledge for Global Network Learning , 2022, Frontiers in Genetics.

[3]  C. Hersh,et al.  Identifying miRNA-mRNA Networks Associated With COPD Phenotypes , 2021, Frontiers in Genetics.

[4]  Ronald P. Schuyler,et al.  Multi-omics subtyping pipeline for chronic obstructive pulmonary disease , 2021, PloS one.

[5]  Q. Tan,et al.  Proteomics of extracellular vesicles in plasma reveals the characteristics and residual traces of COVID-19 patients without underlying diseases after 3 months of recovery , 2021, Cell Death & Disease.

[6]  Eduard Hovy,et al.  A Survey of Data Augmentation Approaches for NLP , 2021, FINDINGS.

[7]  Q. Tan,et al.  An Integrative Transcriptomic and Metabolomic Study Revealed That Melatonin Plays a Protective Role in Chronic Lung Inflammation by Reducing Necroptosis , 2021, Frontiers in Immunology.

[8]  Roman Schulte-Sasse,et al.  Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms , 2021, Nature Machine Intelligence.

[9]  M. Mofrad,et al.  PFP-WGAN: Protein function prediction by discovering Gene Ontology term correlations with generative adversarial networks , 2021, PloS one.

[10]  Joyce D. Schroeder,et al.  Prediction of Obstructive Lung Disease from Chest Radiographs via Deep Learning Trained on Pulmonary Function Data , 2021, International journal of chronic obstructive pulmonary disease.

[11]  Nadezhda T. Doncheva,et al.  The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets , 2020, Nucleic Acids Res..

[12]  M. Humbert,et al.  Targeting transforming growth factor-β receptors in pulmonary hypertension , 2020, European Respiratory Journal.

[13]  Jaime Fern'andez del R'io,et al.  Array programming with NumPy , 2020, Nature.

[14]  Y. Denisenko,et al.  The Role of Toll-Like Receptors 2 and 4 in the Pathogenesis of Chronic Obstructive Pulmonary Disease , 2020, International journal of chronic obstructive pulmonary disease.

[15]  R. Bowler,et al.  Identifying Protein–metabolite Networks Associated with COPD Phenotypes , 2020, Metabolites.

[16]  D. Lynch,et al.  Deep Learning Enables Automatic Classification of Emphysema Pattern at CT. , 2019, Radiology.

[17]  R. Linhardt,et al.  Functional Role of Glycosaminoglycans in Decellularized Lung Extracellular Matrix. , 2019, Acta biomaterialia.

[18]  Panos Kalnis,et al.  GCN-MF: Disease-Gene Association Identification By Graph Convolutional Networks and Matrix Factorization , 2019, KDD.

[19]  Takuya Akiba,et al.  Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[20]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[21]  P. Berger,et al.  Chemokines in COPD: From Implication to Therapeutic Use , 2019, International journal of molecular sciences.

[22]  Blair H. Smith,et al.  Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell type and phenotype associations , 2019, Nature Genetics.

[23]  Bo Xu,et al.  Reconstruction of the Protein-Protein Interaction Network for Protein Complexes Identification by Walking on the Protein Pair Fingerprints Similarity Network , 2018, Front. Genet..

[24]  Åsa M. Wheelock,et al.  Integration of multi-omics datasets enables molecular classification of COPD , 2018, European Respiratory Journal.

[25]  Xiong Li,et al.  Heterogeneity Analysis and Diagnosis of Complex Diseases Based on Deep Learning Method , 2018, Scientific Reports.

[26]  Min Chen,et al.  Heparin-binding epidermal growth factor contributes to COPD disease severity by modulating airway fibrosis and pulmonary epithelial–mesenchymal transition , 2018, Laboratory Investigation.

[27]  M. Cazzola,et al.  Inhaled nebulised unfractionated heparin improves lung function in moderate to very severe COPD: A pilot study , 2017 .

[28]  Seokjun Seo,et al.  Hybrid Approach of Relation Network and Localized Graph Convolutional Filtering for Breast Cancer Subtype Classification , 2017, IJCAI.

[29]  E. Silverman,et al.  RNA sequencing identifies novel non-coding RNA and exon-specific effects associated with cigarette smoking , 2017, BMC Medical Genomics.

[30]  Stephen G West,et al.  Testing Measurement Invariance in Longitudinal Data With Ordered-Categorical Measures , 2017, Psychological methods.

[31]  A. Kisialiou,et al.  Metabolic Disorder in Chronic Obstructive Pulmonary Disease (COPD) Patients: Towards a Personalized Approach Using Marine Drug Derivatives , 2017, Marine drugs.

[32]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..

[33]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[34]  Jennifer G. Dy,et al.  COPD subtypes identified by network-based clustering of blood gene expression. , 2016, Genomics.

[35]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[36]  Jing Ma,et al.  Network-based pathway enrichment analysis with incomplete network information , 2014, Bioinform..

[37]  Jennifer G. Dy,et al.  Phenotypic and genetic heterogeneity among subjects with mild airflow obstruction in COPDGene. , 2014, Respiratory medicine.

[38]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[39]  G. De,et al.  Development and validation of a claims-based prediction model for COPD severity. , 2013, Respiratory medicine.

[40]  D. Geraghty,et al.  HLA-F and MHC Class I Open Conformers Are Ligands for NK Cell Ig-like Receptors , 2013, The Journal of Immunology.

[41]  N. Koulouris,et al.  Inflammation and Immune Response in COPD: Where Do We Stand? , 2013, Mediators of inflammation.

[42]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[43]  Chris Williams,et al.  RNA-SeQC: RNA-seq metrics for quality control and process optimization , 2012, Bioinform..

[44]  E. Regan,et al.  Clinical and radiographic predictors of GOLD-unclassified smokers in the COPDGene study. , 2011, American journal of respiratory and critical care medicine.

[45]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[46]  E. Regan,et al.  Genetic Epidemiology of COPD (COPDGene) Study Design , 2011, COPD.

[47]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[48]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[49]  Pierre Vandergheynst,et al.  Wavelets on Graphs via Spectral Graph Theory , 2009, ArXiv.

[50]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[51]  Scott T. Weiss,et al.  Prediction of chronic obstructive pulmonary disease (COPD) in asthma patients using electronic medical records. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[52]  G. Karakiulakis,et al.  Decreased hyaluronan in airway smooth muscle cells from patients with asthma and COPD , 2009, European Respiratory Journal.

[53]  R. Naeije,et al.  Pulmonary hypertension in COPD , 2008, European Respiratory Journal.

[54]  Bart De Moor,et al.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis , 2005, Bioinform..

[55]  I. Adcock,et al.  Decreased histone deacetylase activity in chronic obstructive pulmonary disease. , 2005, The New England journal of medicine.

[56]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[57]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[58]  J. Hokanson,et al.  Genetic Advances in COPD: Insights from COPDGene. , 2019, American journal of respiratory and critical care medicine.

[59]  E. Regan,et al.  Omics and the Search for Blood Biomarkers in COPD: Insights from COPDGene. , 2019, American journal of respiratory cell and molecular biology.

[60]  Yan V. Sun,et al.  Integrative Analysis of Multi-omics Data for Discovery and Functional Studies of Complex Human Diseases. , 2016, Advances in genetics.

[61]  Nahid Safari-Alighiarloo,et al.  Protein-protein interaction networks (PPI) and complex diseases , 2014, Gastroenterology and hepatology from bed to bench.

[62]  P. Calverley,et al.  Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. , 2007, American journal of respiratory and critical care medicine.

[63]  Adrian Alexa,et al.  Gene set enrichment analysis with topGO , 2006 .

[64]  Rich Caruana,et al.  Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.