Network and Data Integration for Biomarker Signature Discovery via Network Smoothed T-Statistics

Predictive, stable and interpretable gene signatures are generally seen as an important step towards a better personalized medicine. During the last decade various methods have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinics is the typical low reproducibility of signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. We here propose a technique that integrates network information as well as different kinds of experimental data (here exemplified by mRNA and miRNA expression) into one classifier. This is done by smoothing t-statistics of individual genes or miRNAs over the structure of a combined protein-protein interaction (PPI) and miRNA-target gene network. A permutation test is conducted to select features in a highly consistent manner, and subsequently a Support Vector Machine (SVM) classifier is trained. Compared to several other competing methods our algorithm reveals an overall better prediction performance for early versus late disease relapse and a higher signature stability. Moreover, obtained gene lists can be clearly associated to biological knowledge, such as known disease genes and KEGG pathways. We demonstrate that our data integration strategy can improve classification performance compared to using a single data source only. Our method, called stSVM, is available in R-package netClass on CRAN (http://cran.r-project.org).

[1]  J. Rivas,et al.  Deregulation of microRNA expression in the different genetic subtypes of multiple myeloma and correlation with gene expression profiling , 2010, Leukemia.

[2]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  S. Pileri,et al.  Retinoblastoma (RB1) gene product expression in breast carcinoma. Correlation with Ki-67 growth fraction and biopathological profile. , 1998, Journal of clinical pathology.

[4]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  J. Clement,et al.  Expression of bone morphogenetic protein 6 in normal mammary tissue and breast cancer cell lines and its regulation by epidermal growth factor , 1999, International journal of cancer.

[6]  Louise R Howe,et al.  Wnt Signaling and Breast Cancer , 2004, Cancer biology & therapy.

[7]  Min Zhu,et al.  Integrated miRNA and mRNA expression profiling of mouse mammary tumor models identifies miRNA signatures associated with mammary tumor lineage , 2011, Genome Biology.

[8]  Holger Fröhlich,et al.  Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients , 2010, Bioinform..

[9]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[10]  B. Ponder,et al.  Allele loss from large regions of chromosome 17 is common only in certain histological subtypes of ovarian carcinomas. , 1996, British Journal of Cancer.

[11]  S. Brewster,et al.  THE Wnt SIGNALLING PATHWAY IS A POTENTIAL THERAPEUTIC TARGET IN PROSTATE CANCER , 2006, BJU international.

[12]  G. Mills,et al.  Adipocytes promote ovarian cancer metastasis and provide energy for rapid tumor growth , 2011, Nature Medicine.

[13]  Emmanuel Barillot,et al.  Classification of microarray data using gene networks , 2007, BMC Bioinformatics.

[14]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[15]  Klaus Obermayer,et al.  A new summarization method for affymetrix probe level data , 2006, Bioinform..

[16]  W. Vogel,et al.  (CAG)nCAA and GGN repeats in the human androgen receptor gene are not associated with prostate cancer in a French–German population , 1999, European Journal of Human Genetics.

[17]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[18]  J. Varley,et al.  Frequent alterations of cell cycle regulators in early-stage breast lesions as detected by immunohistochemistry. , 1998, British Journal of Cancer.

[19]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[20]  John D. Osborne,et al.  Annotating the human genome with Disease , 2009 .

[21]  W. Kibbe,et al.  Annotating the human genome with Disease Ontology , 2009, BMC Genomics.

[22]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[23]  Stefan Wiemann,et al.  KEGGgraph: a graph approach to KEGG PATHWAY in R and bioconductor , 2009, Bioinform..

[24]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[25]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[26]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[27]  Holger Fröhlich,et al.  pathClass: an R-package for integration of pathway knowledge into support vector machines for biomarker discovery , 2011, Bioinform..

[28]  David Warde-Farley,et al.  Dynamic modularity in protein interaction networks predicts breast cancer outcome , 2009, Nature Biotechnology.

[29]  I. Van der Auwera,et al.  Integrated miRNA and mRNA expression profiling of the inflammatory breast cancer subtype , 2010, British Journal of Cancer.

[30]  S. Brewster,et al.  Wnt signalling and prostate cancer , 2005, Prostate Cancer and Prostatic Diseases.

[31]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[32]  Fan Chung,et al.  The heat kernel as the pagerank of a graph , 2007, Proceedings of the National Academy of Sciences.

[33]  D. Prowse,et al.  Cancer Cell International Inhibition of Androgen-independent Prostate Cancer Cell Growth Is Enhanced by Combination Therapy Targeting Hedgehog and Erbb Signalling , 2008 .

[34]  R. Kudo,et al.  Telomerase activity in malignant ovarian tumors with deregulation of cell cycle regulatory proteins. , 1999, Cancer letters.

[35]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[36]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[37]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[38]  Michael Schroeder,et al.  Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes , 2012, PLoS Comput. Biol..

[39]  Holger Fröhlich,et al.  Prognostic gene signatures for patient stratification in breast cancer - accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions , 2012, BMC Bioinformatics.

[40]  Holger Fröhlich,et al.  Review Biomarker Gene Signature Discovery Integrating Network Knowledge , 2012 .

[41]  Alexandre P. Francisco,et al.  Interactogeneous: Disease Gene Prioritization Using Heterogeneous Networks and Full Topology Scores , 2012, PloS one.

[42]  Tijl De Bie,et al.  Kernel-based data fusion for gene prioritization , 2007, ISMB/ECCB.

[43]  Desmond J. Higham,et al.  GeneRank: Using search engine technology for the analysis of microarray experiments , 2005, BMC Bioinformatics.

[44]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[45]  Jing Ma,et al.  Immunohistochemical expression of BRCA1 and lethal prostate cancer. , 2010, Cancer research.

[46]  Mary J. C. Hendrix,et al.  ErbB/EGF Signaling and EMT in Mammary Development and Breast Cancer , 2010, Journal of Mammary Gland Biology and Neoplasia.

[47]  Yixin Chen,et al.  Graph ranking for exploratory gene data analysis , 2009, BMC Bioinformatics.

[48]  Mithat Gönen,et al.  Statistical aspects of gene signatures and molecular targets. , 2009, Gastrointestinal cancer research : GCR.

[49]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[50]  Tim Beißbarth,et al.  Graph based fusion of miRNA and mRNA expression data improves clinical outcome prediction in prostate cancer , 2011, BMC Bioinformatics.

[51]  Harald Binder,et al.  Incorporating pathway information into boosting estimation of high-dimensional risk prediction models , 2009, BMC Bioinformatics.

[52]  Michalis E. Blazadonakis,et al.  Complementary Gene Signature Integration in Multiplatform Microarray Experiments , 2011, IEEE Transactions on Information Technology in Biomedicine.

[53]  W. Gerald,et al.  Targeting AKT/mTOR and ERK MAPK signaling inhibits hormone-refractory prostate cancer in a preclinical mouse model. , 2008, The Journal of clinical investigation.

[54]  Wei Pan,et al.  Network-based support vector machine for classification of microarray samples , 2009, BMC Bioinformatics.

[55]  M. Provencio,et al.  Detection of loss of heterozygosity at RAD51, RAD52, RAD54 and BRCA1 and BRCA2 loci in breast cancer: pathological correlations , 1999 .

[56]  H. Kölbl,et al.  The humoral immune system has a key prognostic impact in node-negative breast cancer. , 2008, Cancer research.

[57]  Lincoln D. Stein,et al.  Cancer genomics: technology, discovery, and translation. , 2012, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[58]  A. Cress,et al.  Inhibition of p38-MAPK signaling pathway attenuates breast cancer induced bone pain and disease progression in a murine model of cancer-induced bone pain , 2011, Molecular pain.

[59]  Qing Wang,et al.  Towards precise classification of cancers based on robust gene functional expression profiles , 2005, BMC Bioinformatics.

[60]  Vasyl Pihur,et al.  RankAggreg, an R package for weighted rank aggregation , 2009, BMC Bioinformatics.

[61]  A. Sivachenko,et al.  Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer , 2012, Nature Genetics.

[62]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[63]  L. Chin,et al.  Making sense of cancer genomic data. , 2011, Genes & development.

[64]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[65]  Lodewyk F. A. Wessels,et al.  A Critical Evaluation of Network and Pathway-Based Classifiers for Outcome Prediction in Breast Cancer , 2011, PloS one.

[66]  Doheon Lee,et al.  Inferring Pathway Activity toward Precise Disease Classification , 2008, PLoS Comput. Biol..

[67]  Steve Goodison,et al.  Optimizing molecular signatures for predicting prostate cancer recurrence , 2009, The Prostate.

[68]  C. Sander,et al.  Integrative genomic profiling of human prostate cancer. , 2010, Cancer cell.