Graph based fusion of miRNA and mRNA expression data improves clinical outcome prediction in prostate cancer

BackgroundOne of the main goals in cancer studies including high-throughput microRNA (miRNA) and mRNA data is to find and assess prognostic signatures capable of predicting clinical outcome. Both mRNA and miRNA expression changes in cancer diseases are described to reflect clinical characteristics like staging and prognosis. Furthermore, miRNA abundance can directly affect target transcripts and translation in tumor cells. Prediction models are trained to identify either mRNA or miRNA signatures for patient stratification. With the increasing number of microarray studies collecting mRNA and miRNA from the same patient cohort there is a need for statistical methods to integrate or fuse both kinds of data into one prediction model in order to find a combined signature that improves the prediction.ResultsHere, we propose a new method to fuse miRNA and mRNA data into one prediction model. Since miRNAs are known regulators of mRNAs we used the correlations between them as well as the target prediction information to build a bipartite graph representing the relations between miRNAs and mRNAs. This graph was used to guide the feature selection in order to improve the prediction. The method is illustrated on a prostate cancer data set comprising 98 patient samples with miRNA and mRNA expression data. The biochemical relapse was used as clinical endpoint. It could be shown that the bipartite graph in combination with both data sets could improve prediction performance as well as the stability of the feature selection.ConclusionsFusion of mRNA and miRNA expression data into one prediction model improves clinical outcome prediction in terms of prediction error and stable feature selection. The R source code of the proposed method is available in the supplement.

[1]  Emmanuel Barillot,et al.  Classification of microarray data using gene networks , 2007, BMC Bioinformatics.

[2]  C. Tepper,et al.  microRNAs and prostate cancer , 2008, Journal of cellular and molecular medicine.

[3]  Holger Fröhlich,et al.  Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients , 2010, Bioinform..

[4]  Alexey I Nesvizhskii,et al.  Quantitative Proteomic Profiling of Prostate Cancer Reveals a Role for miR-128 in Prostate Cancer* , 2009, Molecular & Cellular Proteomics.

[5]  C. Croce Causes and consequences of microRNA dysregulation in cancer , 2009, Nature Reviews Genetics.

[6]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[7]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[8]  Panayiotis V. Benos,et al.  mirConnX: condition-specific mRNA-microRNA network integrator , 2011, Nucleic Acids Res..

[9]  C. Croce,et al.  MicroRNA expression and function in cancer. , 2006, Trends in molecular medicine.

[10]  G. Tutz,et al.  Generalized Additive Modeling with Implicit Variable Selection by Likelihood‐Based Boosting , 2006, Biometrics.

[11]  Harald Binder,et al.  The benefit of data-based model complexity selection via prediction error curves in time-to-event data , 2011, Comput. Stat..

[12]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[13]  B S Weir,et al.  Truncated product method for combining P‐values , 2002, Genetic epidemiology.

[14]  C. Burge,et al.  Most mammalian mRNAs are conserved targets of microRNAs. , 2008, Genome research.

[15]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[16]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[17]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[18]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[19]  A. Jemal,et al.  Global cancer statistics , 2011, CA: a cancer journal for clinicians.

[20]  David Galas,et al.  Systems biology of interstitial lung diseases: integration of mRNA and microRNA expression changes , 2011, BMC Medical Genomics.

[21]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[22]  Gabriele Sales,et al.  MAGIA, a web-based tool for miRNA and Genes Integrated Analysis , 2010, Nucleic Acids Res..

[23]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[24]  S. Knuutila,et al.  Integrative analysis of microRNA, mRNA and aCGH data reveals asbestos‐ and histology‐related changes in lung cancer , 2011, Genes, chromosomes & cancer.

[25]  Chao Cheng,et al.  Inferring MicroRNA Activities by Combining Gene Expression with MicroRNA Target Prediction , 2008, PloS one.

[26]  D.,et al.  Regression Models and Life-Tables , 2022 .

[27]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[28]  J. Goeman L1 Penalized Estimation in the Cox Proportional Hazards Model , 2009, Biometrical journal. Biometrische Zeitschrift.

[29]  Harald Binder,et al.  Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models , 2008, BMC Bioinformatics.

[30]  Udaya B. Kogalur,et al.  Random Survival Forests for R , 2007 .

[31]  Harald Binder,et al.  Leveraging external knowledge on molecular interactions in classification methods for risk prediction of patients , 2011, Biometrical journal. Biometrische Zeitschrift.

[32]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[33]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[34]  Holger Sültmann,et al.  Circulating miRNAs are correlated with tumor progression in prostate cancer , 2011, International journal of cancer.

[35]  Torsten Hothorn,et al.  Bundling Classifiers by Bagging Trees , 2002, Comput. Stat. Data Anal..

[36]  Billy I. Ross,et al.  The American Soldier. , 1898 .

[37]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[38]  Ann M. Hess,et al.  Filtering for increased power for microarray data analysis , 2009, BMC Bioinformatics.

[39]  Anton J. Enright,et al.  MicroRNA targets in Drosophila , 2003, Genome Biology.

[40]  BMC Bioinformatics , 2005 .

[41]  A. Jemal,et al.  Global Cancer Statistics , 2011 .

[42]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[43]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[44]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[45]  Harald Binder,et al.  Incorporating pathway information into boosting estimation of high-dimensional risk prediction models , 2009, BMC Bioinformatics.

[46]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[47]  J. Tosoian,et al.  PSA and Beyond: The Past, Present, and Future of Investigative Biomarkers for Prostate Cancer , 2010, TheScientificWorldJournal.

[48]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[49]  Suresh Gopalan ResurfP: a response surface aided parametric test for identifying differentials in GeneChip based oligonucleotide array experiments , 2004, Genome Biology.

[50]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[51]  Edward A. Suchman,et al.  Studies in Social Psychology in World War II. Vol. I: The American Soldier: Adjustment during Army Life , 1951 .

[52]  M. Cronin,et al.  A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. , 2004, The New England journal of medicine.

[53]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[54]  E Graf,et al.  Assessment and comparison of prognostic classification schemes for survival data. , 1999, Statistics in medicine.

[55]  Thomas A Gerds,et al.  Efron‐Type Measures of Prediction Error for Survival Analysis , 2007, Biometrics.

[56]  J. Suykens,et al.  A kernel-based integration of genome-wide data for clinical decision support , 2009, Genome Medicine.

[57]  L. Santarpia,et al.  Breast cancer assessment tools and optimizing adjuvant therapy , 2010, Nature Reviews Clinical Oncology.

[58]  K. Gunsalus,et al.  Combinatorial microRNA target predictions , 2005, Nature Genetics.

[59]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[60]  M. Norton Genome Medicine: the future of medicine , 2009, Genome Medicine.

[61]  F. T. G. Prunty Society for Endocrinology , 1955 .

[62]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[63]  Blaz Zupan,et al.  Towards knowledge-based gene expression data mining , 2007, J. Biomed. Informatics.

[64]  S. Powers,et al.  New views into the prostate cancer genome. , 2010, Cancer cell.

[65]  Huiqing Yuan,et al.  MicroRNAs and prostate cancer. , 2010, Acta biochimica et biophysica Sinica.

[66]  C. Sander,et al.  Integrative genomic profiling of human prostate cancer. , 2010, Cancer cell.

[67]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[68]  Harald Binder,et al.  Bioinformatics Applications Note Parallelized Prediction Error Estimation for Evaluation of High-dimensional Models , 2022 .

[69]  Holger Sültmann,et al.  Serum microRNAs as non-invasive biomarkers for cancer , 2010, Molecular Cancer.

[70]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .