netDx: Interpretable patient classification using integrated patient similarity networks

Patient classification has widespread biomedical and clinical applications, including diagnosis, prognosis and treatment response prediction. A clinically useful prediction algorithm should be accurate, generalizable, be able to integrate diverse data types, and handle sparse data. A clinical predictor based on genomic data needs to be easily interpretable to drive hypothesis-driven research into new treatments. We describe netDx, a novel supervised patient classification framework based on patient similarity networks. netDx meets the above criteria and particularly excels at data integration and model interpretability. As a machine learning method, netDx demonstrates consistently excellent performance in a cancer survival benchmark across four cancer types by integrating up to six genomic and clinical data types. In these tests, netDx has significantly higher average performance than most other machine-learning approaches across most cancer types and its best model outperforms all other methods for two cancer types. In comparison to traditional machine learning-based patient classifiers, netDx results are more interpretable, visualizing the decision boundary in the context of patient similarity space. When patient similarity is defined by pathway-level gene expression, netDx identifies biological pathways important for outcome prediction, as demonstrated in diverse data sets of breast cancer and asthma. Thus, netDx can serve both as a patient classifier and as a tool for discovery of biological features characteristic of disease. We provide a software complete implementation of netDx along with sample files and automation workflows in R.

[1]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[2]  Chris Sander,et al.  The molecular diversity of Luminal A breast tumors , 2013, Breast Cancer Research and Treatment.

[3]  Gary D. Bader,et al.  GeneMANIA Prediction Server 2013 Update , 2013, Nucleic Acids Res..

[4]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[5]  T. Habuchi,et al.  Serum N-glycan alteration associated with renal cell carcinoma detected by high throughput glycan analysis. , 2014, The Journal of urology.

[6]  M. Hallett,et al.  Absolute assignment of breast cancer intrinsic molecular subtype. , 2015, Journal of the National Cancer Institute.

[7]  F. Bertucci,et al.  Decreased expression of ABAT and STC2 hallmarks ER‐positive inflammatory breast cancer and endocrine therapy resistance in advanced disease , 2015, Molecular oncology.

[8]  Kian Fan Chung,et al.  Transcriptome analysis shows activation of circulating CD8+ T cells in patients with severe asthma. , 2012, The Journal of allergy and clinical immunology.

[9]  D. Easton,et al.  BOADICEA breast cancer risk prediction model: updates to cancer incidences, tumour pathology and web interface , 2013, British Journal of Cancer.

[10]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[11]  R. Figlin,et al.  Improved prognostication of renal cell carcinoma using an integrated staging system. , 2001, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[12]  M. García-Closas,et al.  Absolute risk models for subtypes of breast cancer. , 2007, Journal of the National Cancer Institute.

[13]  Benjamin S. Glicksberg,et al.  Identification of type 2 diabetes subgroups through topological analysis of patient similarity , 2015, Science Translational Medicine.

[14]  E. Gelfand,et al.  Jagged1 on Dendritic Cells and Notch on CD4+ T Cells Initiate Lung Allergic Responsiveness by Inducing IL-4 Production12 , 2009, The Journal of Immunology.

[15]  S. Powell,et al.  BRCA1 and BRCA2: different roles in a common pathway of genome protection , 2011, Nature Reviews Cancer.

[16]  Jing Chen,et al.  NDEx, the Network Data Exchange. , 2015, Cell systems.

[17]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Kosha Ruparel,et al.  The Philadelphia Neurodevelopmental Cohort: constructing a deep phenotyping collaborative. , 2015, Journal of child psychology and psychiatry, and allied disciplines.

[19]  R. Mathias Introduction to genetics and genomics in asthma: genetics of asthma. , 2014, Advances in experimental medicine and biology.

[20]  E. Knol,et al.  Gene expression in CD4+ T-cells reflects heterogeneity in infant wheezing phenotypes , 2008, European Respiratory Journal.

[21]  B. Chiang,et al.  Notch Ligand DLL4 Alleviates Allergic Airway Inflammation via Induction of a Homeostatic Regulatory Pathway , 2017, Scientific Reports.

[22]  S. Pinho,et al.  Glycosylation in cancer: mechanisms and clinical implications , 2015, Nature Reviews Cancer.

[23]  Q. Morris,et al.  Labeling Nodes Using Three Degrees of Propagation , 2012, PloS one.

[24]  Heejung Bang,et al.  Identifying individuals at high risk for diabetes: The Atherosclerosis Risk in Communities study. , 2005, Diabetes care.

[25]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of clear cell renal cell carcinoma , 2013, Nature.

[26]  Michael Jones,et al.  Novel breast cancer susceptibility locus at 9q31.2: results of a genome-wide association study. , 2011, Journal of the National Cancer Institute.

[27]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[28]  E. Steyerberg,et al.  Compliance with biopsy recommendations of a prostate cancer risk calculator , 2012, BJU international.

[29]  Steven J. M. Jones,et al.  Comprehensive genomic characterization of squamous cell lung cancers , 2012, Nature.

[30]  T. Rebbeck,et al.  Incorporating tumour pathology information into breast cancer risk prediction algorithms , 2010, Breast Cancer Research.

[31]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[32]  Leszek Rychlewski,et al.  FFAS03: a server for profile–profile sequence alignments , 2005, Nucleic Acids Res..

[33]  Quaid Morris,et al.  Fast integration of heterogeneous data sources for predicting gene function with limited annotation , 2010, Bioinform..

[34]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[35]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[36]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[37]  Davide Castelvecchi,et al.  Can we open the black box of AI? , 2016, Nature.

[38]  S. M. Abdel-Megeed Accuracy of Correlation Coefficient with Limited Number of Points , 1984 .

[39]  Gary D Bader,et al.  Visualizing gene-set enrichment results using the Cytoscape plug-in enrichment map. , 2011, Methods in molecular biology.

[40]  Jingqin Luo,et al.  Evaluation of Urine Aquaporin-1 and Perilipin-2 Concentrations as Biomarkers to Screen for Renal Cell Carcinoma: A Prospective Cohort Study. , 2015, JAMA oncology.

[41]  D. Levy,et al.  Prediction of coronary heart disease using risk factor categories. , 1998, Circulation.

[42]  E. Gelfand,et al.  Essential role of Notch signaling in effector memory CD8+ T cell–mediated airway hyperresponsiveness and inflammation , 2008, The Journal of experimental medicine.

[43]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[44]  Gary D. Bader,et al.  AutoAnnotate: A Cytoscape app for summarizing networks with semantic annotations , 2016, F1000Research.

[45]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[46]  T. Halazonetis,et al.  Genomic instability — an evolving hallmark of cancer , 2010, Nature Reviews Molecular Cell Biology.

[47]  Brent S. Pedersen,et al.  DNA methylation and childhood asthma in the inner city. , 2015, The Journal of allergy and clinical immunology.

[48]  Zhenjun Hu,et al.  Functional characterization of breast cancer using pathway profiles , 2014, BMC Medical Genomics.

[49]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[50]  P. Karp,et al.  Computational prediction of human metabolic pathways from the complete human genome , 2004, Genome Biology.

[51]  W. Hancock,et al.  Clusterin glycopeptide variant characterization reveals significant site-specific glycan changes in the plasma of clear cell renal cell carcinoma. , 2015, Journal of proteome research.

[52]  G. Mills,et al.  Genome-wide Transcriptome Profiling of Homologous Recombination DNA Repair , 2014, Nature Communications.

[53]  Hiroaki Kitano,et al.  The PANTHER database of protein families, subfamilies, functions and pathways , 2004, Nucleic Acids Res..

[54]  D. Easton,et al.  The BOADICEA model of genetic susceptibility to breast and ovarian cancer , 2004, British Journal of Cancer.

[55]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[56]  A. Edinger,et al.  Nutrient transporters: the Achilles’ heel of anabolism , 2013, Trends in Endocrinology & Metabolism.

[57]  Adam A. Margolin,et al.  Assessing the clinical utility of cancer genomic and proteomic data across tumor types , 2014, Nature Biotechnology.

[58]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[59]  A. B. Hill,et al.  "The Environment and Disease: Association or Causation?" (1965), by Austin Bradford Hill , 2017 .

[60]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.

[61]  Gary D Bader,et al.  NetPath: a public resource of curated signal transduction pathways , 2010, Genome Biology.

[62]  G. von Heijne,et al.  Tissue-based map of the human proteome , 2015, Science.