Integrative modeling of multi-omics data to identify cancer drivers and infer patient-specific gene activity

BackgroundHigh throughput technologies have been used to profile genes in multiple different dimensions, such as genetic variation, copy number, gene and protein expression, epigenetics, metabolomics. Computational analyses often treat these different data types as independent, leading to an explosion in the number of features making studies under-powered and more importantly do not provide a comprehensive view of the gene’s state. We sought to infer gene activity by integrating different dimensions using biological knowledge of oncogenes and tumor suppressors.ResultsThis paper proposes an integrative model of oncogene and tumor suppressor activity in cells which is used to identify cancer drivers and compute patient-specific gene activity scores. We have developed a Fuzzy Logic Modeling (FLM) framework to incorporate biological knowledge with multi-omics data such as somatic mutation, gene expression and copy number measurements. The advantage of using a fuzzy logic approach is to abstract meaningful biological rules from low-level numerical data. Biological knowledge is often qualitative, thus combining it with quantitative numerical measurements may leverage new biological insights about a gene’s state. We show that the oncogenic and altered tumor suppressing state of a gene can be better characterized by integrating different molecular measurements with biological knowledge than by each data type alone. We validate the gene activity score using data from the Cancer Cell Line Encyclopedia and drug sensitivity data for five compounds: BYL719 (PIK3CA inhibitor), PLX4720 (BRAF inhibitor), AZD6244 (MEK inhibitor), Erlotinib (EGFR inhibitor), and Nutlin-3 (MDM2 inhibitor). The integrative score improves prediction of drug sensitivity for the known drug targets of these compounds compared to each data type alone. The gene activity scores are also used to cluster colorectal cancer cell lines. Two subtypes of CRCs were found and potential cancer drivers and therapeutic targets for each of the subtypes were identified.ConclusionsWe propose a fuzzy logic based approach to infer gene activity in cancer by integrating numerical data with descriptive biological knowledge. We compute general patient-specific gene-level scores useful to determine the oncogenic or tumor suppressor status of cancer gene drivers and to cluster or classify patients.

[1]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of human colon and rectal cancer , 2012, Nature.

[2]  W. Kuehl,et al.  Activated fibroblast growth factor receptor 3 is an oncogene that contributes to tumor progression in multiple myeloma. , 2001, Blood.

[3]  Christopher J Kemp,et al.  Tumor suppressor genetics. , 2005, Carcinogenesis.

[4]  Dmitriy Sonkin,et al.  Tumor Suppressors Status in Cancer Cell Line Encyclopedia , 2013, Molecular oncology.

[5]  I. Johnstone,et al.  Statistical challenges of high-dimensional data , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[6]  James M. Keller,et al.  Applications of Fuzzy Logic in Bioinformatics , 2008, Series on Advances in Bioinformatics and Computational Biology.

[7]  Aleix Prat Aparicio Comprehensive molecular portraits of human breast tumours , 2012 .

[8]  Pierre Hainaut,et al.  Massively regulated genes: the example of TP53 , 2010, The Journal of pathology.

[9]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[10]  C. Daub,et al.  BMC Systems Biology , 2007 .

[11]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[12]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[13]  E. Burnside,et al.  A logistic regression model based on the national mammography database format to aid breast cancer diagnosis. , 2009, AJR. American journal of roentgenology.

[14]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[15]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[16]  Hisao Ishibuchi,et al.  Fuzzy Classification of Gene Expression Data , 2007, 2007 IEEE International Fuzzy Systems Conference.

[17]  H. Ohtsuki,et al.  Accumulation of driver and passenger mutations during tumor progression , 2009, Proceedings of the National Academy of Sciences.

[18]  Jin-Kao Hao,et al.  Fuzzy Logic for Elimination of Redundant Information of Microarray Data , 2008, Genom. Proteom. Bioinform..

[19]  Bruce A.J. Ponder,et al.  Cancer genetics , 2001, Nature.

[20]  Matthew D. Wilkerson,et al.  ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking , 2010, Bioinform..

[21]  David Haussler,et al.  PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis , 2012, Bioinform..

[22]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[23]  Florian Markowetz,et al.  Poor-prognosis colon cancer is defined by a molecularly distinct subtype and develops from serrated precursor lesions , 2013, Nature Medicine.

[24]  Frédérique Lisacek,et al.  PanelomiX: A threshold-based algorithm to create panels of biomarkers , 2013 .

[25]  Vasyl Pihur,et al.  Detecting Gene Regulatory Networks from Microarray Data Using Fuzzy Logic , 2009, Fuzzy Systems in Bioinformatics and Computational Biology.

[26]  K. O'Byrne,et al.  Relationship Between EGFR Expression, EGFR Mutation Status, and the Efficacy of Chemotherapy Plus Cetuximab in FLEX Study Patients with Advanced Non–Small-Cell Lung Cancer , 2014, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[27]  Steven J. M. Jones,et al.  Comprehensive genomic characterization of squamous cell lung cancers , 2012, Nature.

[28]  Stephen N. Jones,et al.  Regulation of p53 stability by Mdm2 , 1997, Nature.

[29]  D. Erdmann,et al.  Characterization of the Novel and Specific PI3Kα Inhibitor NVP-BYL719 and Development of the Patient Stratification Strategy for Clinical Trials , 2014, Molecular Cancer Therapeutics.

[30]  T. Shinohara,et al.  [In-vivo activation of the p53 pathway by small-molecule antagonists of MDM2]. , 2007, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme.

[31]  Karla Kerlikowske,et al.  Prospective breast cancer risk prediction model for women undergoing screening mammography. , 2006, Journal of the National Cancer Institute.

[32]  Benjamin E. Gross,et al.  Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal , 2013, Science Signaling.

[33]  N. Cox,et al.  Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines , 2014, Genome Biology.

[34]  Erhan Bilal,et al.  Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling , 2013, PLoS Comput. Biol..

[35]  A. Thompson,et al.  Polo Like Kinase 2 Tumour Suppressor and cancer biomarker: new perspectives on drug sensitivity/resistance in ovarian cancer. , 2012 .

[36]  Steven J. M. Jones,et al.  Comprehensive molecular profiling of lung adenocarcinoma , 2014, Nature.

[37]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[38]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[39]  Manuel Serrano,et al.  Induction of p53-dependent senescence by the MDM2 antagonist nutlin-3a in mouse cells of fibroblast origin. , 2007, Cancer research.

[40]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[41]  Xavier Duval,et al.  Growth factor receptor expression in anal squamous lesions: modifications associated with oncogenic human papillomavirus and human immunodeficiency virus. , 2009, Human pathology.

[42]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[43]  Illinois.,et al.  Cancer Genetics , 1976, British Journal of Cancer.

[44]  Kern Rei Chng,et al.  Patient-specific driver gene prediction and risk assessment through integrated network analysis of cancer omics profiles , 2015, Nucleic acids research.

[45]  Setia Pramana,et al.  Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival , 2015, Bioinform..

[46]  K. Søreide,et al.  EGFR and downstream genetic alterations in KRAS/BRAF and PI3K/AKT pathways in colorectal cancer: implications for targeted therapy. , 2012, Discovery medicine.