Regularized group regression methods for genomic prediction: Bridge, MCP, SCAD, group bridge, group lasso, sparse group lasso, group MCP and group SCAD

BackgroundGenomic prediction is now widely recognized as an efficient, cost-effective and theoretically well-founded method for estimating breeding values using molecular markers spread over the whole genome. The prediction problem entails estimating the effects of all genes or chromosomal segments simultaneously and aggregating them to yield the predicted total genomic breeding value. Many potential methods for genomic prediction exist but have widely different relative computational costs, complexity and ease of implementation, with significant repercussions for predictive accuracy. We empirically evaluate the predictive performance of several contending regularization methods, designed to accommodate grouping of markers, using three synthetic traits of known accuracy.MethodsEach of the competitor methods was used to estimate predictive accuracy for each of the three quantitative traits. The traits and an associated genome comprising five chromosomes with 10000 biallelic Single Nucleotide Polymorphic (SNP)-marker loci were simulated for the QTL-MAS 2012 workshop. The models were trained on 3000 phenotyped and genotyped individuals and used to predict genomic breeding values for 1020 unphenotyped individuals. Accuracy was expressed as the Pearson correlation between the simulated true and the estimated breeding values.ResultsAll the methods produced accurate estimates of genomic breeding values. Grouping of markers did not clearly improve accuracy contrary to expectation. Selecting the penalty parameter with replicated 10-fold cross validation often gave better accuracy than using information theoretic criteria.ConclusionsAll the regularization methods considered produced satisfactory predictive accuracies for most practical purposes and thus deserve serious consideration in genomic prediction research and practice. Grouping markers did not enhance predictive accuracy for the synthetic data set considered. But other more sophisticated grouping schemes could potentially enhance accuracy. Using cross validation to select the penalty parameters for the methods often yielded more accurate estimates of predictive accuracy than using information theoretic criteria.

[1]  T. Hastie,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Discussion , 1993 .

[2]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[5]  J C Whittaker,et al.  Marker-assisted selection using ridge regression. , 2000, Genetical research.

[6]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[7]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[8]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[9]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[10]  Yuhong Yang Can the Strengths of AIC and BIC Be Shared , 2005 .

[11]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[12]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[13]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[14]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[15]  Cun-Hui Zhang PENALIZED LINEAR UNBIASED SELECTION , 2007 .

[16]  Hansheng Wang,et al.  Computational Statistics and Data Analysis a Note on Adaptive Group Lasso , 2022 .

[17]  A. Rinaldo,et al.  On the asymptotic properties of the group lasso estimator for linear models , 2008 .

[18]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[19]  Volker Roth,et al.  The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms , 2008, ICML '08.

[20]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[21]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[22]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[23]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[24]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[25]  Cun-Hui Zhang,et al.  A group bridge approach for variable selection , 2009, Biometrika.

[26]  Jian Huang,et al.  Penalized methods for bi-level variable selection. , 2009, Statistics and its interface.

[27]  H. Piepho Ridge Regression and Extensions for Genomewide Selection in Maize , 2009 .

[28]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[29]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[30]  Ji Zhu,et al.  Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer. , 2008, The annals of applied statistics.

[31]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[32]  Raymond J Carroll,et al.  Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context , 2011, The American statistician.

[33]  Cheolwoo Park,et al.  Bridge regression: Adaptivity and group selection , 2011 .

[34]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[35]  Daniel Percival Theoretical Properties of the Overlapping Groups Lasso , 2011, 1103.4614.

[36]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[37]  Jian Huang,et al.  A Selective Review of Group Selection in High-Dimensional Models. , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.

[38]  Hsiao-Pei Yang,et al.  Genomic Selection in Plant Breeding: A Comparison of Models , 2012 .

[39]  J. Ogutu,et al.  Efficient Computation of Ridge‐Regression Best Linear Unbiased Prediction in Genomic Selection in Plant Breeding , 2012 .

[40]  J. Ogutu,et al.  Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions , 2012, BMC Proceedings.

[41]  Trevor Hastie,et al.  Learning interactions through hierarchical group-lasso regularization , 2013, 1308.2719.

[42]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[43]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.