Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework

BackgroundApplying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies.ResultsA supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile.ConclusionsThe model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems.

[1]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[2]  Leming Shi,et al.  Effect of training-sample size and classification difficulty on the accuracy of genomic predictors , 2010, Breast Cancer Research.

[3]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[4]  Louise R Howe,et al.  Wnt Signaling and Breast Cancer , 2004, Cancer biology & therapy.

[5]  Lincoln Stein,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Res..

[6]  Anil Potti,et al.  A Genomic Approach to Improve Prognosis and Predict Therapeutic Response in Chronic Lymphocytic Leukemia , 2009, Clinical Cancer Research.

[7]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[8]  F. Pontén,et al.  CDK-mediated activation of the SCFFBXO28 ubiquitin ligase promotes MYC-driven transcription and tumourigenesis and predicts poor survival in breast cancer , 2013, EMBO molecular medicine.

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  Lazaros G. Papageorgiou,et al.  A mixed integer optimisation model for data classification , 2009, Comput. Ind. Eng..

[11]  Michal Sheffer,et al.  Pathway-based personalized analysis of cancer , 2013, Proceedings of the National Academy of Sciences.

[12]  E. Dougherty,et al.  Accurate and Reliable Cancer Classification Based on Probabilistic Inference of Pathway Activity , 2009, PloS one.

[13]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Emmanuel Barillot,et al.  Classification of microarray data using gene networks , 2007, BMC Bioinformatics.

[15]  Shu-Lin Wang,et al.  Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification , 2012, BMC Bioinformatics.

[16]  Tao Huang,et al.  Differential combinatorial regulatory network analysis related to venous metastasis of hepatocellular carcinoma , 2012, BMC Genomics.

[17]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Rod K. Nibbe,et al.  Discovery and Scoring of Protein Interaction Subnetworks Discriminative of Late Stage Human Colon Cancer*S , 2009, Molecular & Cellular Proteomics.

[19]  C. Ouzounis,et al.  Transcriptome classification reveals molecular subtypes in psoriasis , 2012, BMC Genomics.

[20]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[21]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[22]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[23]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[24]  BMC Bioinformatics , 2005 .

[25]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[26]  J. Bergh,et al.  Strong Time Dependence of the 76-Gene Prognostic Signature for Node-Negative Breast Cancer Patients in the TRANSBIG Multicenter Independent Validation Series , 2007, Clinical Cancer Research.

[27]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[28]  Hongyu Diao,et al.  Gene Expression Profiling Combined with Bioinformatics Analysis Identify Biomarkers for Parkinson Disease , 2012, PloS one.

[29]  Robert Clarke,et al.  Identifying cancer biomarkers by network-constrained support vector machines , 2011, BMC Systems Biology.

[30]  S. Dhanasekaran,et al.  Delineation of prognostic biomarkers in prostate cancer , 2001, Nature.

[31]  Lazaros G. Papageorgiou,et al.  Disease Classification through Integer Optimisation , 2011 .

[32]  J. Taylor‐Papadimitriou,et al.  Changes in mucin‐type O‐glycosylation in breast cancer: implications for the host immune response , 2004 .

[33]  Andrew Johnston,et al.  Genome-Wide Expression Profiling of Five Mouse Models Identifies Similarities and Differences with Human Psoriasis , 2011, PloS one.

[34]  Mamoru Fukuda,et al.  Ubiquitin and breast cancer , 2004, Oncogene.

[35]  Doheon Lee,et al.  Inferring Pathway Activity toward Precise Disease Classification , 2008, PLoS Comput. Biol..

[36]  Profiling metabolic changes in breast cancer with targeted proteomics , 2014, Cancer & Metabolism.

[37]  Xing-Ming Zhao,et al.  Identifying dysregulated pathways in cancers from pathway interaction networks , 2012, BMC Bioinformatics.

[38]  C. Perou,et al.  Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. , 2006, JAMA.

[39]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[40]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[41]  Justin Guinney,et al.  GSVA: gene set variation analysis for microarray and RNA-Seq data , 2013, BMC Bioinformatics.

[42]  E. Moore,et al.  Proteomic profiling of the mesenteric lymph after hemorrhagic shock: Differential gel electrophoresis and mass spectrometry analysis , 2010, Clinical Proteomics.

[43]  Philip M. Long,et al.  Breast cancer classification and prognosis based on gene expression profiles from a population-based study , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Metabolic transformations in breast cancer subtypes , 2014, Cancer & Metabolism.

[45]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[46]  E. d'Hennezel,et al.  FOXP3 forkhead domain mutation and regulatory T cells in the IPEX syndrome. , 2009, The New England journal of medicine.

[47]  Xi Chen,et al.  Integrating Biological Knowledge with Gene Expression Profiles for Survival Prediction of Cancer , 2009, J. Comput. Biol..

[48]  Roland Eils,et al.  Prediction of clinical outcome and biological characterization of neuroblastoma by expression profiling , 2004, Oncogene.

[49]  D. Chan,et al.  Aberrant glycosylation associated with enzymes as cancer biomarkers , 2011, Clinical Proteomics.

[50]  I. Halil Kavakli,et al.  Optimization Based Tumor Classification from Microarray Gene Expression Data , 2011, PloS one.

[51]  Reinhard Schneider,et al.  PathVar: analysis of gene and protein expression variance in cellular pathways using microarray data , 2011, Bioinform..

[52]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[53]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[54]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[55]  C. Croce,et al.  MicroRNA gene expression deregulation in human breast cancer. , 2005, Cancer research.

[56]  L. Holmberg,et al.  Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts , 2005, Breast Cancer Research.

[57]  T. Ideker,et al.  Subnetwork-based analysis of chronic lymphocytic leukemia identifies pathways that associate with disease progression. , 2011, Blood.

[58]  F. Azuaje What does systems biology mean for biomarker discovery? , 2010, Expert opinion on medical diagnostics.

[59]  Dianwen Zhu,et al.  CUNY Academic , 2016 .

[60]  Francis J. Doyle,et al.  Core module biomarker identification with network exploration for breast cancer metastasis , 2012, BMC Bioinformatics.

[61]  D. Dai,et al.  Cancer Subtype Discovery and Biomarker Identification via a New Robust Network Clustering Algorithm , 2013, PloS one.

[62]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[63]  Paul A. Rubin,et al.  Feature Selection for Multiclass Discrimination via Mixed-Integer Linear Programming , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  Edward R. Dougherty,et al.  Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network , 2010, BMC Bioinformatics.

[65]  G. Landberg,et al.  Wnt Pathway Activity in Breast Cancer Sub-Types and Stem-Like Cells , 2013, PloS one.

[66]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[67]  M. J. van de Vijver,et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.

[68]  Teresa M. Przytycka,et al.  Identifying Causal Genes and Dysregulated Pathways in Complex Diseases , 2011, PLoS Comput. Biol..

[69]  S. Tsoka,et al.  Integrative Biology Approach Identifies Cytokine Targeting Strategies for Psoriasis , 2014, Science Translational Medicine.

[70]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[71]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[72]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[73]  Andrew E. Teschendorff,et al.  DART: Denoising Algorithm based on Relevance network Topology improves molecular pathway activity inference , 2011, BMC Bioinformatics.

[74]  Yong Huang,et al.  Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer , 2012, PLoS Comput. Biol..

[75]  Debashis Ghosh,et al.  EZH2 is a marker of aggressive breast cancer and promotes neoplastic transformation of breast epithelial cells , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[76]  E. Lander,et al.  A molecular signature of metastasis in primary solid tumors , 2003, Nature Genetics.

[77]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[78]  Lodewyk F. A. Wessels,et al.  A Critical Evaluation of Network and Pathway-Based Classifiers for Outcome Prediction in Breast Cancer , 2011, PloS one.

[79]  R. Tibshirani,et al.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[80]  Matti Pirinen,et al.  Identification of 15 new psoriasis susceptibility loci highlights the role of innate immunity , 2012 .

[81]  Mark Lebwohl,et al.  Psoriasis , 1906, The Lancet.

[82]  P. Vincent,et al.  VE-cadherin-p120 interaction is required for maintenance of endothelial barrier function. , 2004, American journal of physiology. Lung cellular and molecular physiology.

[83]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[84]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[85]  Brad T. Sherman,et al.  The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists , 2007, Genome Biology.

[86]  Wei Zhang,et al.  Disruption of endothelial adherens junction by invasive breast cancer cells is mediated by reactive oxygen species and is attenuated by AHCC. , 2013, Life sciences.

[87]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[88]  Fan Zhang,et al.  Topologically inferring risk-active pathways toward precise cancer classification by directed random walk , 2013, Bioinform..

[89]  Qing Wang,et al.  Towards precise classification of cancers based on robust gene functional expression profiles , 2005, BMC Bioinformatics.

[90]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[91]  J. Mesirov,et al.  Predicting relapse in patients with medulloblastoma by integrating evidence from clinical and genomic features. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[92]  David Cameron,et al.  Identification of molecular apocrine breast tumours by microarray analysis , 2005, Oncogene.

[93]  Yihong Yao,et al.  Type I Interferon: Potential Therapeutic Target for Psoriasis? , 2008, PloS one.

[94]  Joaquín Dopazo,et al.  From genes to functional classes in the study of biological systems , 2007, BMC Bioinformatics.