Data integration by multi-tuning parameter elastic net regression

BackgroundTo integrate molecular features from multiple high-throughput platforms in prediction, a regression model that penalizes features from all platforms equally is commonly used. However, data from different platforms are likely to differ in effect sizes, the proportion of predictive features, and correlations structures. Subtle but important features may be missed by shrinking all features equally.ResultsWe propose an Elastic net (EN) model with separate tuning parameter penalties for each platform that is fit using standard software. In a comprehensive simulation study, we evaluated the performance of EN logistic regression with multiple tuning penalties. We found that when the number of informative features differs among the platforms, and when there is no notable correlation between the features from different platforms, the multi-tuning parameter EN yields more predictive models. Moreover, the multi-tuning parameter EN is robust, in the sense that there is no loss of predictivity relative to a single tuning parameter EN when features across all platforms have similar effects. We also investigated the performance of multi-tuning parameter EN using real cancer datasets.ConclusionThe proposed multi-tuning parameter EN model, fit using standard penalized regression software, can achieve better prediction in sample classification when integrating multiple genomic platforms, compared to the traditional method where a single penalty parameter is used for all features in different platforms.

[1]  Qing Zhao,et al.  Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA , 2015, Briefings Bioinform..

[2]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[3]  K. Gunderson,et al.  High density DNA methylation array with single CpG site resolution. , 2011, Genomics.

[4]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[5]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[6]  Fabien Campagne,et al.  DNA methylation signatures identify biologically distinct subtypes in acute myeloid leukemia. , 2010, Cancer cell.

[7]  Jian Huang,et al.  BMC Bioinformatics BioMed Central Methodology article Supervised group Lasso with applications to microarray data , 2007 .

[8]  C. Kumar,et al.  Genetic abnormalities and challenges in the treatment of acute myeloid leukemia. , 2011, Genes & cancer.

[9]  Zheyang Wu,et al.  Integrated Multidimensional Analysis Is Required for Accurate Prognostic Biomarkers in Colorectal Cancer , 2014, PloS one.

[10]  A. Melnick,et al.  HELP (HpaII tiny fragment enrichment by ligation-mediated PCR) assay for DNA methylation profiling of primary normal and malignant B lymphocytes. , 2010, Methods in molecular biology.

[11]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[12]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[13]  H. Abdi Partial Least Square Regression PLS-Regression , 2007 .

[14]  H. Martens Partial least squares regression (PLSR) , 1993 .

[15]  Giancarlo Raiconi,et al.  MVDA: a multi-view genomic data integration methodology , 2015, BMC Bioinformatics.

[16]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[17]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[18]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[19]  Carlos Cordon-Cardo,et al.  Molecular profiling of tumor progression in head and neck cancer. , 2005, Archives of otolaryngology--head & neck surgery.

[20]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[21]  Marcel J. T. Reinders,et al.  Integration of gene expression and DNA-methylation profiles improves molecular subtype classification in acute myeloid leukemia , 2015, BMC Bioinformatics.

[22]  Bas J. Wouters,et al.  Prediction of molecular subtypes in acute myeloid leukemia based on gene expression profiling , 2009, Haematologica.

[23]  I Jurisica,et al.  Integrin α11β1 regulates cancer stromal stiffness and promotes tumorigenicity and metastasis in non-small cell lung cancer , 2015, Oncogene.

[24]  Marylyn D. Ritchie,et al.  Predicting censored survival data based on the interactions between meta-dimensional omics data in breast cancer , 2015, J. Biomed. Informatics.

[25]  Laura M. Heiser,et al.  A community effort to assess and improve drug sensitivity prediction algorithms , 2014, Nature Biotechnology.

[26]  D. Horsman,et al.  Cytogenetic abnormalities in primary myelodysplastic syndrome are highly predictive of outcome after allogeneic bone marrow transplantation. , 1998, Blood.

[27]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[28]  Raj Chari,et al.  An integrative multi-dimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer , 2010, BMC Systems Biology.

[29]  Sameer Chopra,et al.  Identifying aggressive prostate cancer foci using a DNA methylation classifier , 2017, Genome Biology.

[30]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[31]  D. Christiansen,et al.  Methylation of p15INK4B is common, is associated with deletion of genes on chromosome arm 7q and predicts a poor prognosis in therapy-related myelodysplasia and acute myeloid leukemia , 2003, Leukemia.

[32]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[33]  Nci Dream Community A community effort to assess and improve drug sensitivity prediction algorithms , 2014 .

[34]  Shuang Wang,et al.  Penalized logistic regression for high-dimensional DNA methylation data with case-control studies , 2012, Bioinform..

[35]  W. Gerald,et al.  Gene expression profiling predicts clinical outcome of prostate cancer. , 2004, The Journal of clinical investigation.

[36]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[37]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[38]  Zhifeng Dong,et al.  ABCC5 supports osteoclast formation and promotes breast cancer metastasis to bone , 2012, Breast Cancer Research.

[39]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .