Kernel-based whole-genome prediction of complex traits: a review

Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

[1]  C. Schön,et al.  Bias and Sampling Error of the Estimated Proportion of Genotypic Variance Explained by Quantitative Trait Loci Determined From Experimental Data in Maize Using Cross Validation and Validation With Independent Samples. , 2000, Genetics.

[2]  R. Fernando,et al.  Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validation , 2011, Genetics Selection Evolution.

[3]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[4]  Frank Technow,et al.  Comparison of whole-genome prediction models for traits with contrasting genetic architecture in a diversity panel of maize inbred lines , 2012, BMC Genomics.

[5]  C. R. Henderson SIRE EVALUATION AND GENETIC TRENDS , 1973 .

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  Xihong Lin,et al.  A powerful and flexible multilocus association test for quantitative traits. , 2008, American journal of human genetics.

[8]  D. Gianola,et al.  On marker-assisted prediction of genetic value: beyond the ridge. , 2003, Genetics.

[9]  J. Woolliams,et al.  The Impact of Genetic Architecture on Genome-Wide Evaluation Methods , 2010, Genetics.

[10]  Andrés Legarra,et al.  A note on the rationale for estimating genealogical coancestry from molecular markers , 2011, Genetics Selection Evolution.

[11]  M. Goddard,et al.  Invited review: Genomic selection in dairy cattle: progress and challenges. , 2009, Journal of dairy science.

[12]  C. R. Henderson Best Linear Unbiased Prediction of Breeding Values Not in the Model for Records , 1977 .

[13]  L. Lazzeroni,et al.  P-values in genomics: Apparent precision masks high uncertainty , 2014, Molecular Psychiatry.

[14]  P. VanRaden,et al.  Invited review: reliability of genomic predictions for North American Holstein bulls. , 2009, Journal of dairy science.

[15]  G. Malécot,et al.  Les mathématiques de l'hérédité , 1948 .

[16]  D. Allison,et al.  Beyond Missing Heritability: Prediction of Complex Traits , 2011, PLoS genetics.

[17]  Daniel Gianola,et al.  Kernel-based variance component estimation and whole-genome prediction of pre-corrected phenotypes and progeny tests for dairy cow health traits , 2014, Front. Genet..

[18]  J Crossa,et al.  Genomic prediction in CIMMYT maize and wheat breeding programs , 2013, Heredity.

[19]  M. Goddard,et al.  Using the genomic relationship matrix to predict the accuracy of genomic selection. , 2011, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[20]  P. VanRaden,et al.  Efficient methods to compute genomic predictions. , 2008, Journal of dairy science.

[21]  Shengwen Wang,et al.  Mixed Model Methods for Genomic Prediction and Variance Component Estimation of Additive and Dominance Effects Using SNP Markers , 2014, PloS one.

[22]  H. Akaike A new look at the statistical model identification , 1974 .

[23]  Shalabh Statistical Learning from a Regression Perspective , 2009 .

[24]  Hans-Peter Piepho,et al.  Genome-wide selection by mixed model ridge regression and extensions based on geostatistical models , 2010, BMC proceedings.

[25]  P Pérez-Rodríguez,et al.  Genome-enabled methods for predicting litter size in pigs: a comparison. , 2013, Animal : an international journal of animal bioscience.

[26]  D. Gianola,et al.  Reproducing Kernel Hilbert Spaces Regression Methods for Genomic Assisted Prediction of Quantitative Traits , 2008, Genetics.

[27]  S. Wright,et al.  An Analysis of Variability in Number of Digits in an Inbred Strain of Guinea Pigs. , 1934, Genetics.

[28]  M. Goddard,et al.  Accelerating improvement of livestock with genomic selection. , 2013, Annual review of animal biosciences.

[29]  T. F. Hansen WHY EPISTASIS IS IMPORTANT FOR SELECTION AND ADAPTATION , 2013, Evolution; international journal of organic evolution.

[30]  B. Ripley,et al.  Semiparametric Regression: Preface , 2003 .

[31]  S. Wright,et al.  Systems of Mating. I. the Biometric Relations between Parent and Offspring. , 1921, Genetics.

[32]  Deniz Akdemir,et al.  Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions , 2013, Theoretical and Applied Genetics.

[33]  Hae Kyung Im,et al.  Poly‐Omic Prediction of Complex Traits: OmicKriging , 2013, Genetic epidemiology.

[34]  D. de Koning,et al.  Why Breeding Values Estimated Using Familial Data Should Not Be Used for Genome-Wide Association Studies , 2013, G3: Genes, Genomes, Genetics.

[35]  A. Nejati-Javaremi,et al.  Effect of total allelic relationship on accuracy of evaluation and response to selection. , 1997, Journal of animal science.

[36]  P Pérez-Rodríguez,et al.  Model averaging for genome-enabled prediction with reproducing kernel Hilbert spaces: a case study with pig litter size and wheat yield. , 2014, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[37]  José Crossa,et al.  Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. , 2010, Genetics research.

[38]  Robin Thompson,et al.  [That BLUP is a Good Thing: The Estimation of Random Effects]: Comment , 1991 .

[39]  M Grossman,et al.  Marker assisted selection using best linear unbiased prediction , 1989, Genetics Selection Evolution.

[40]  Robenzon E. Lorenzana,et al.  Genomewide predictions from maize single-cross data , 2012, Theoretical and Applied Genetics.

[41]  A. Plutynski,et al.  What was Fisher's fundamental theorem of natural selection and what was it for? , 2006, Studies in history and philosophy of biological and biomedical sciences.

[42]  Deanne M. Taylor,et al.  Powerful SNP-set analysis for case-control genome-wide association studies. , 2010, American journal of human genetics.

[43]  K. Weigel,et al.  Radial basis function regression methods for predicting quantitative traits using SNP markers. , 2010, Genetics research.

[44]  Daniel Gianola,et al.  Additive Genetic Variability and the Bayesian Alphabet , 2009, Genetics.

[45]  Kent A Weigel,et al.  Genome-assisted prediction of a quantitative trait measured in parents and progeny: application to food conversion rate in chickens , 2009, Genetics Selection Evolution.

[46]  C. R. Henderson Best Linear Unbiased Prediction of Nonadditive Genetic Merits in Noninbred Populations , 1985 .

[47]  M. Lund,et al.  Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers , 2012, PloS one.

[48]  D. Gianola,et al.  Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in Wheat , 2012, G3: Genes | Genomes | Genetics.

[49]  D Gianola,et al.  Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. , 2009, Journal of animal science.

[50]  Peter M Visscher,et al.  Prediction of individual genetic risk to disease from genome-wide association studies. , 2007, Genome research.

[51]  Daniel Gianola,et al.  Theory and Analysis of Threshold Characters , 1982 .

[52]  Chris Haley,et al.  Efficiency of marker assisted selection , 1997 .

[53]  Arnab Maity,et al.  Multivariate Phenotype Association Analysis by Marker‐Set Kernel Machine Regression , 2012, Genetic epidemiology.

[54]  D. Falconer The inheritance of liability to certain diseases, estimated from the incidence among relatives , 1965 .

[55]  N. Schork,et al.  Generalized genomic distance-based regression methodology for multilocus association analysis. , 2006, American journal of human genetics.

[56]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[57]  Hiroshi Sato,et al.  Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction , 2002, Nature Genetics.

[58]  Xiaochun Sun,et al.  Nonparametric Method for Genomics-Based Prediction of Performance of Quantitative Traits Involving Epistasis in Plant Breeding , 2012, PloS one.

[59]  Michael Edward Hohn,et al.  An Introduction to Applied Geostatistics: by Edward H. Isaaks and R. Mohan Srivastava, 1989, Oxford University Press, New York, 561 p., ISBN 0-19-505012-6, ISBN 0-19-505013-4 (paperback), $55.00 cloth, $35.00 paper (US) , 1991 .

[60]  Jeffrey B. Endelman,et al.  Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP , 2011 .

[61]  K. Weigel,et al.  Enhancing Genome-Enabled Prediction by Bagging Genomic BLUP , 2014, PloS one.

[62]  Luis Varona,et al.  On the Additive and Dominant Variance and Covariance of Individuals Within the Genomic Selection Scope , 2013, Genetics.

[63]  José Crossa,et al.  A reaction norm model for genomic selection using high-dimensional genomic and environmental data , 2013, Theoretical and Applied Genetics.

[64]  B Gredler,et al.  Accuracy of direct genomic values for functional traits in Brown Swiss cattle. , 2014, Journal of dairy science.

[65]  M. Stitt,et al.  Genomic and metabolic prediction of complex heterotic traits in hybrid maize , 2012, Nature Genetics.

[66]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[67]  J Crossa,et al.  Genomic-enabled prediction with classification algorithms , 2014, Heredity.

[68]  A. Carriquiry,et al.  Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures , 2014, G3: Genes, Genomes, Genetics.

[69]  L. A. García-Cortés,et al.  Combining Genomic and Genealogical Information in a Reproducing Kernel Hilbert Spaces Regression Model for Genome-Enabled Predictions in Dairy Cattle , 2014, PloS one.

[70]  D. Gianola Priors in Whole-Genome Regression: The Bayesian Alphabet Returns , 2013, Genetics.

[71]  Daniel Gianola Statistical learning methods for genome‐based analysis of quantitative traits , 2010 .

[72]  R. Fernando,et al.  Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor , 2013, PLoS genetics.

[73]  Hsiao-Pei Yang,et al.  Genomic Selection in Plant Breeding: A Comparison of Models , 2012 .

[74]  The Lancet Psychiatry Five years. , 2016, The lancet. Psychiatry.

[75]  R. Pong-Wong,et al.  Benefits from marker-assisted selection under an additive polygenic genetic model. , 2005, Journal of animal science.

[76]  Daniel Gianola,et al.  Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts , 2012, Front. Gene..

[77]  H. Piepho Ridge Regression and Extensions for Genomewide Selection in Maize , 2009 .

[78]  Lexin Li,et al.  Nonlinear dimension reduction with Wright–Fisher kernel for genotype aggregation and association mapping , 2012, Bioinform..

[79]  M. Goddard Genomic selection: prediction of accuracy and maximisation of long term response , 2009, Genetica.

[80]  Bruce Tier,et al.  A comparison of five methods to predict genomic breeding values of dairy bulls from genome-wide SNP markers , 2009, Genetics Selection Evolution.

[81]  Jean-Luc Jannink,et al.  Factors Affecting Accuracy From Genomic Selection in Populations Derived From Multiple Inbred Lines: A Barley Case Study , 2009, Genetics.

[82]  Joseph E. Powell,et al.  Detection and replication of epistasis influencing transcription in humans , 2014, Nature.

[83]  Daniel Gianola,et al.  Genome-enabled prediction of quantitative traits in chickens using genomic annotation , 2014, BMC Genomics.

[84]  R. Fernando,et al.  Genomic-Assisted Prediction of Genetic Value With Semiparametric Procedures , 2006, Genetics.

[85]  José Crossa,et al.  Genome-enabled Prediction of Complex Traits with Kernel Methods: What Have We Learned? , 2014 .

[86]  Kent A Weigel,et al.  Nonparametric Methods for Incorporating Genomic Information Into Genetic Evaluations: An Application to Mortality in Broilers , 2008, Genetics.

[87]  Hans D. Daetwyler,et al.  Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach , 2008, PloS one.

[88]  Daniel Gianola,et al.  Application of support vector regression to genome-assisted prediction of quantitative traits , 2011, Theoretical and Applied Genetics.

[89]  M. Stone An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[90]  M Erbe,et al.  Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. , 2012, Journal of dairy science.

[91]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[92]  L. Penrose,et al.  THE CORRELATION BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE , 2022 .

[93]  K. A. Weigel,et al.  Statistical Learning Methods For Genome-based Analysis Of Quantitative Traits , 2010 .

[94]  Henner Simianer,et al.  A Function Accounting for Training Set Size and Marker Density to Model the Average Accuracy of Genomic Prediction , 2013, PloS one.

[95]  A. Lusis,et al.  Systems genetics approaches to understand complex traits , 2013, Nature Reviews Genetics.

[96]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[97]  G. Udny Yule,et al.  MENDEL'S LAWS AND THEIR PROBABLE RELATIONS TO INTRA‐RACIAL HEREDITY. , 1902 .

[98]  R. Lande,et al.  Efficiency of marker-assisted selection in the improvement of quantitative traits. , 1990, Genetics.

[99]  Kent A Weigel,et al.  Predicting complex traits using a diffusion kernel on genetic markers with an application to dairy cattle and wheat data , 2013, Genetics Selection Evolution.

[100]  José Crossa,et al.  Prediction of Genetic Values of Quantitative Traits in Plant Breeding Using Pedigree and Molecular Markers , 2010, Genetics.

[101]  R. Fernando,et al.  Genomic BLUP Decoded: A Look into the Black Box of Genomic Prediction , 2013, Genetics.

[102]  Justin M. O'Sullivan,et al.  The missing story behind Genome Wide Association Studies: single nucleotide polymorphisms in gene deserts have a story to tell , 2014, Front. Genet..

[103]  G. Robinson That BLUP is a Good Thing: The Estimation of Random Effects , 1991 .

[104]  Jason H. Moore,et al.  The limits of p-values for biological data mining , 2013, BioData Mining.

[105]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[106]  C. R. Henderson,et al.  Best linear unbiased estimation and prediction under a selection model. , 1975, Biometrics.

[107]  Emilio Porcu,et al.  Predicting Genetic Values: A Kernel-Based Best Linear Unbiased Prediction With Genomic Data , 2011, Genetics.

[108]  Joseph E. Powell,et al.  Congruence of Additive and Non-Additive Effects on Gene Expression Estimated from Pedigree and SNP Data , 2013, PLoS genetics.

[109]  Sarah Mae Sincero Heredity , 1875, Nature.

[110]  Lisa Onaga,et al.  Toyama Kametaro and Vernon Kellogg: Silkworm Inheritance Experiments in Japan, Siam, and the United States, 1900–1912 , 2010, Journal of the history of biology.

[111]  M P L Calus,et al.  Genomic breeding value prediction: methods and procedures. , 2010, Animal : an international journal of animal bioscience.

[112]  Sewall Wright,et al.  Coefficients of Inbreeding and Relationship , 1922, The American Naturalist.

[113]  J. E. Cairns,et al.  Genome-enabled prediction of genetic values using radial basis function neural networks , 2012, Theoretical and Applied Genetics.

[114]  T. Würschum,et al.  Cross-validation in association mapping and its relevance for the estimation of QTL parameters of complex traits , 2013, Heredity.

[115]  Mikko J. Sillanpää,et al.  A Bayesian Mixed Regression Based Prediction of Quantitative Traits from Molecular Marker and Gene Expression Data , 2011, PloS one.

[116]  A. Templeton Systems of Mating , 2006, Population Genetics and Microevolutionary Theory.

[117]  M. Calus,et al.  Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding , 2013, Genetics.

[118]  José Crossa,et al.  Genomic Prediction in Maize Breeding Populations with Genotyping-by-Sequencing , 2013, G3: Genes, Genomes, Genetics.