Predictive ability of genome-assisted statistical models under various forms of gene action

Recent work has suggested that the performance of prediction models for complex traits may depend on the architecture of the target traits. Here we compared several prediction models with respect to their ability of predicting phenotypes under various statistical architectures of gene action: (1) purely additive, (2) additive and dominance, (3) additive, dominance, and two-locus epistasis, and (4) purely epistatic settings. Simulation and a real chicken dataset were used. Fourteen prediction models were compared: BayesA, BayesB, BayesC, Bayesian LASSO, Bayesian ridge regression, elastic net, genomic best linear unbiased prediction, a Gaussian process, LASSO, random forests, reproducing kernel Hilbert spaces regression, ridge regression (best linear unbiased prediction), relevance vector machines, and support vector machines. When the trait was under additive gene action, the parametric prediction models outperformed non-parametric ones. Conversely, when the trait was under epistatic gene action, the non-parametric prediction models provided more accurate predictions. Thus, prediction models must be selected according to the most probably underlying architecture of traits. In the chicken dataset examined, most models had similar prediction performance. Our results corroborate the view that there is no universally best prediction models, and that the development of robust prediction models is an important research objective.

[1]  D. Gianola,et al.  Reproducing Kernel Hilbert Spaces Regression Methods for Genomic Assisted Prediction of Quantitative Traits , 2008, Genetics.

[2]  Jun Zhu,et al.  Mapping the genetic architecture of complex traits in experimental populations , 2007, Bioinform..

[3]  R. Fernando,et al.  Genomic-Assisted Prediction of Genetic Value With Semiparametric Procedures , 2006, Genetics.

[4]  O. González-Recio,et al.  Genotyping strategies for genomic selection in small dairy cattle populations. , 2012, Animal : an international journal of animal bioscience.

[5]  Mehdi Sargolzaei,et al.  QMSim: a large-scale genome simulator for livestock , 2009, Bioinform..

[6]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[7]  Eva Bauer,et al.  Genome Properties and Prospects of Genomic Prediction of Hybrid Performance in a Breeding Program of Maize , 2014, Genetics.

[8]  D. Gianola,et al.  On marker-assisted prediction of genetic value: beyond the ridge. , 2003, Genetics.

[9]  Daniel Gianola,et al.  Additive Genetic Variability and the Bayesian Alphabet , 2009, Genetics.

[10]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[11]  Qifa Zhang,et al.  The main effects, epistatic effects and environmental interactions of QTLs on the cooking and eating quality of rice in a doubled-haploid line population , 2005, Theoretical and Applied Genetics.

[12]  José Crossa,et al.  Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. , 2010, Genetics research.

[13]  G. de los Campos,et al.  Genome-Wide Regression and Prediction with the BGLR Statistical Package , 2014, Genetics.

[14]  M. Sorrells,et al.  Genomic Selection for Crop Improvement , 2009 .

[15]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[16]  W. G. Hill,et al.  Influence of Gene Interaction on Complex Trait Variation with Multilocus Models , 2014, Genetics.

[17]  R. Wu,et al.  Mapping complex traits as a dynamic system. , 2015, Physics of life reviews.

[18]  J.-L. Wu,et al.  Analysis on additive effects and additive-by-additive epistatic effects of QTLs for yield traits in a recombinant inbred line population of rice , 2002, Theoretical and Applied Genetics.

[19]  R. Frankham Introduction to quantitative genetics (4th edn): by Douglas S. Falconer and Trudy F.C. Mackay Longman, 1996. £24.99 pbk (xv and 464 pages) ISBN 0582 24302 5 , 1996 .

[20]  R. Fernando,et al.  Genomic Prediction of Hybrid Wheat Performance , 2013 .

[21]  Rohan L. Fernando,et al.  Extension of the bayesian alphabet for genomic selection , 2011, BMC Bioinformatics.

[22]  C. Cockerham,et al.  An Extension of the Concept of Partitioning Hereditary Variance for Analysis of Covariances among Relatives When Epistasis Is Present. , 1954, Genetics.

[23]  Daniel Gianola,et al.  Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts , 2012, Front. Gene..

[24]  Jeffrey B. Endelman,et al.  Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP , 2011 .

[25]  A. Carriquiry,et al.  Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures , 2014, G3: Genes, Genomes, Genetics.

[26]  M. Goddard,et al.  Genome-wide association and genomic selection in animal breeding. , 2010, Genome.

[27]  M. Calus,et al.  Genomic Prediction in Animals and Plants: Simulation of Data, Validation, Reporting, and Benchmarking , 2013, Genetics.

[28]  R. Ortiz,et al.  Genomic selection: genome-wide prediction in plant improvement. , 2014, Trends in plant science.

[29]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[30]  José Crossa,et al.  Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree , 2009, Genetics.

[31]  J. Hickey,et al.  Different models of genetic variation and their effect on genomic evaluation , 2011, Genetics Selection Evolution.

[32]  M. Goddard,et al.  Genetic Architecture of Complex Traits and Accuracy of Genomic Prediction: Coat Colour, Milk-Fat Percentage, and Type in Holstein Cattle as Contrasting Model Traits , 2010, PLoS genetics.

[33]  Ky L. Mathews,et al.  Genomic Prediction of Genetic Values for Resistance to Wheat Rusts , 2012 .

[34]  B. Mangin,et al.  On the Accuracy of Genomic Selection , 2016, PloS one.

[35]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics (e1071), TU Wien , 2014 .

[36]  Guosheng Su,et al.  Comparison on genomic predictions using three GBLUP methods and two single-step blending methods in the Nordic Holstein population , 2012, Genetics Selection Evolution.

[37]  P Pérez-Rodríguez,et al.  Model averaging for genome-enabled prediction with reproducing kernel Hilbert spaces: a case study with pig litter size and wheat yield. , 2014, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[38]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[39]  R. Fernando,et al.  Genomic BLUP Decoded: A Look into the Black Box of Genomic Prediction , 2013, Genetics.

[40]  Daniel Gianola,et al.  Genome-enabled prediction of quantitative traits in chickens using genomic annotation , 2014, BMC Genomics.

[41]  M. Kimura,et al.  An introduction to population genetics theory , 1971 .

[42]  Daniel Gianola,et al.  Kernel-based whole-genome prediction of complex traits: a review , 2014, Front. Genet..

[43]  Fionn Murtagh,et al.  Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? , 2011, Journal of Classification.

[44]  N. Reinsch,et al.  Including non-additive genetic effects in Bayesian methods for the prediction of genetic values based on genome-wide markers , 2011, BMC Genetics.

[45]  P. Kambadur,et al.  Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods , 2015, PloS one.

[46]  M. Goddard,et al.  LASSO with cross-validation for genomic selection. , 2009, Genetics research.

[47]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[48]  M. Calus,et al.  Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding , 2013, Genetics.

[49]  P. VanRaden,et al.  Efficient methods to compute genomic predictions. , 2008, Journal of dairy science.

[50]  Daniel Gianola,et al.  Inferring genetic values for quantitative traits non-parametrically. , 2008, Genetics research.

[51]  W. G. Hill,et al.  Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits , 2008, PLoS genetics.

[52]  J. Holland,et al.  Genetic architecture of complex traits in plants. , 2007, Current opinion in plant biology.

[53]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[54]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[55]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[56]  J. Woolliams,et al.  The Impact of Genetic Architecture on Genome-Wide Evaluation Methods , 2010, Genetics.

[57]  Daniel Gianola,et al.  Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits $ , 2014 .

[58]  M. Slatkin,et al.  An Introduction to Population Genetics: Theory and Applications , 2013 .

[59]  James B. Holland,et al.  Epistasis and Plant Breeding , 2010 .

[60]  Daniel Gianola,et al.  Using Whole-Genome Sequence Data to Predict Quantitative Trait Phenotypes in Drosophila melanogaster , 2012, PLoS genetics.

[61]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.