Integrative phenotyping framework (iPF): integrative clustering of multiple omics data identifies novel lung disease subphenotypes

BackgroundThe increased multi-omics information on carefully phenotyped patients in studies of complex diseases requires novel methods for data integration. Unlike continuous intensity measurements from most omics data sets, phenome data contain clinical variables that are binary, ordinal and categorical.ResultsIn this paper we introduce an integrative phenotyping framework (iPF) for disease subtype discovery. A feature topology plot was developed for effective dimension reduction and visualization of multi-omics data. The approach is free of model assumption and robust to data noises or missingness. We developed a workflow to integrate homogeneous patient clustering from different omics data in an agglomerative manner and then visualized heterogeneous clustering of pairwise omics sources. We applied the framework to two batches of lung samples obtained from patients diagnosed with chronic obstructive lung disease (COPD) or interstitial lung disease (ILD) with well-characterized clinical (phenomic) data, mRNA and microRNA expression profiles. Application of iPF to the first training batch identified clusters of patients consisting of homogenous disease phenotypes as well as clusters with intermediate disease characteristics. Analysis of the second batch revealed a similar data structure, confirming the presence of intermediate clusters. Genes in the intermediate clusters were enriched with inflammatory and immune functional annotations, suggesting that they represent mechanistically distinct disease subphenotypes that may response to immunomodulatory therapies. The iPF software package and all source codes are publicly available.ConclusionsIdentification of subclusters with distinct clinical and biomolecular characteristics suggests that integration of phenomic and other omics information could lead to identification of novel mechanism-based disease sub-phenotypes.

[1]  Eric F Lock,et al.  JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. , 2011, The annals of applied statistics.

[2]  Kazuki Saito,et al.  Integrated omics approaches in plant systems biology. , 2009, Current opinion in chemical biology.

[3]  S. Wood Thin plate regression splines , 2003 .

[4]  Takeshi Johkoh,et al.  American Thoracic Society Documents An Official ATS / ERS / JRS / ALAT Statement : Idiopathic Pulmonary Fibrosis : Evidence-based Guidelines for Diagnosis and Management , 2011 .

[5]  N. Kaminski,et al.  Accelerated Variant of Idiopathic Pulmonary Fibrosis: Clinical Behavior and Gene Expression Pattern , 2007, PloS one.

[6]  F. Martinez,et al.  Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. , 2007, American journal of respiratory and critical care medicine.

[7]  Dan S. Tawfik Messy biology and the origins of evolutionary innovations. , 2010, Nature chemical biology.

[8]  D. Geman,et al.  Computational Medicine: Translating Models to Clinical Care , 2012 .

[9]  Joyce S Lee,et al.  Clinical features and outcomes in combined pulmonary fibrosis and emphysema in idiopathic pulmonary fibrosis. , 2013, Chest.

[10]  Bang Wong,et al.  Points of view: Integrating data , 2012, Nature Methods.

[11]  Giulio Cossu,et al.  Corrigendum: Mesoangioblast stem cells ameliorate muscle function in dystrophic dogs , 2013, Nature.

[12]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for microarray meta-analysis , 2012, Nucleic acids research.

[13]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for GWAS meta-analysis , 2012, Nucleic acids research.

[14]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[15]  Beatriz de la Iglesia,et al.  Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering Algorithms , 2006, J. Math. Model. Algorithms.

[16]  M. Cugmas,et al.  On comparing partitions , 2015 .

[17]  P. Laird,et al.  Discovery of multi-dimensional modules by integrative analysis of cancer genomic data , 2012, Nucleic acids research.

[18]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[19]  R. Fox,et al.  Mechanism of Action of Antimalarial Drugs: Inhibition of Antigen Processing and Presentation , 1993, Lupus.

[20]  Sampsa Hautaniemi,et al.  CNAmet: an R package for integrating copy number, methylation and expression data , 2011, Bioinform..

[21]  Shi-Hua Zhang,et al.  Identifying multi-layer gene regulatory modules from multi-dimensional genomic data , 2012, Bioinform..

[22]  D. Wallace The Yale School of Medicine , 1934, Science.

[23]  Weiwen Zhang,et al.  Integrating multiple 'omics' analysis for microbial biology: application and methodologies. , 2010, Microbiology.

[24]  V. Mootha,et al.  Integrative genomics identifies MCU as an essential component of the mitochondrial calcium uniporter , 2011, Nature.

[25]  Wessel N. van Wieringen,et al.  Modeling Association Between Dna Copy Number and Gene Expression with Constrained Piecewise Linear Regression Splines , 2013, 1312.1795.

[26]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[27]  M. Selman,et al.  AMERICAN THORACIC SOCIETY; EUROPEAN RESPIRATORY SOCIETY; AMERICAN COLLEGE OF CHEST PHYSICIANS. IDIOPATHIC PULMONARY FIBROSIS: PREVAILING AND EVOLVING HYPOTHESES ABOUT ITS PATHOGENESIS AND IMPLICATIONS FOR THERAPY , 2001 .

[28]  David B. Dunson,et al.  Bayesian consensus clustering , 2013, Bioinform..

[29]  Hyungwon Choi,et al.  A Double-Layered Mixture Model for the Joint Analysis of DNA Copy Number and Gene Expression Data , 2010, J. Comput. Biol..

[30]  Rob J Hyndman,et al.  Applications: Generalized Additive Modelling of Mixed Distribution Markov Models with Application to Melbourne's Rainfall , 2000 .

[31]  Kevin J Anstrom,et al.  Prednisone, azathioprine, and N-acetylcysteine for pulmonary fibrosis. , 2012, The New England journal of medicine.

[32]  D. Schroeder,et al.  Incidence, prevalence, and clinical course of idiopathic pulmonary fibrosis: a population-based study. , 2010, Chest.

[33]  Karsten Zengler,et al.  The challenges of integrating multi-omic data sets. , 2010, Nature chemical biology.

[34]  R. Fox,et al.  Mechanism of action of hydroxychloroquine as an antirheumatic drug. , 1993, Seminars in arthritis and rheumatism.

[35]  J. Samet,et al.  Corticosteroids and the treatment of idiopathic pulmonary fibrosis. Past, present, and future. , 1996, Chest.

[36]  A. Pardo,et al.  Idiopathic Pulmonary Fibrosis: Prevailing and Evolving Hypotheses about Its Pathogenesis and Implications for Therapy , 2001, Annals of Internal Medicine.

[37]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[38]  L. Espinoza,et al.  Refractory Nephrotic Syndrome in Lupus Nephritis: Favorable Response to Indomethacin Therapy , 1993, Lupus.

[39]  T. Down,et al.  A functional methylome map of ulcerative colitis , 2012, Genome research.