Predicting complex traits using a diffusion kernel on genetic markers with an application to dairy cattle and wheat data

BackgroundArguably, genotypes and phenotypes may be linked in functional forms that are not well addressed by the linear additive models that are standard in quantitative genetics. Therefore, developing statistical learning models for predicting phenotypic values from all available molecular information that are capable of capturing complex genetic network architectures is of great importance. Bayesian kernel ridge regression is a non-parametric prediction model proposed for this purpose. Its essence is to create a spatial distance-based relationship matrix called a kernel. Although the set of all single nucleotide polymorphism genotype configurations on which a model is built is finite, past research has mainly used a Gaussian kernel.ResultsWe sought to investigate the performance of a diffusion kernel, which was specifically developed to model discrete marker inputs, using Holstein cattle and wheat data. This kernel can be viewed as a discretization of the Gaussian kernel. The predictive ability of the diffusion kernel was similar to that of non-spatial distance-based additive genomic relationship kernels in the Holstein data, but outperformed the latter in the wheat data. However, the difference in performance between the diffusion and Gaussian kernels was negligible.ConclusionsIt is concluded that the ability of a diffusion kernel to capture the total genetic variance is not better than that of a Gaussian kernel, at least for these data. Although the diffusion kernel as a choice of basis function may have potential for use in whole-genome prediction, our results imply that embedding genetic markers into a non-Euclidean metric space has very small impact on prediction. Our results suggest that use of the black box Gaussian kernel is justified, given its connection to the diffusion kernel and its similar predictive performance.

[1]  Marco Saerens,et al.  An Experimental Investigation of Graph Kernels on Collaborative Recommendation and Semisupervised Classification , 2006 .

[2]  D Gianola,et al.  An assessment of linkage disequilibrium in Holstein cattle using a Bayesian network. , 2012, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[3]  Kent A Weigel,et al.  Nonparametric Methods for Incorporating Genomic Information Into Genetic Evaluations: An Application to Mortality in Broilers , 2008, Genetics.

[4]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[5]  R. Fernando,et al.  Genomic-Assisted Prediction of Genetic Value With Semiparametric Procedures , 2006, Genetics.

[6]  Laurence Loewe,et al.  A framework for evolutionary systems biology , 2009, BMC Systems Biology.

[7]  D. Gianola,et al.  Reproducing Kernel Hilbert Spaces Regression Methods for Genomic Assisted Prediction of Quantitative Traits , 2008, Genetics.

[8]  G. M.,et al.  Partial Differential Equations I , 2023, Applied Mathematical Sciences.

[9]  Daniel Gianola,et al.  Additive Genetic Variability and the Bayesian Alphabet , 2009, Genetics.

[10]  Daniel Gianola,et al.  Predicting genetic predisposition in humans: the promise of whole-genome markers , 2010, Nature Reviews Genetics.

[11]  A. Verbyla,et al.  Genetics Selection Evolution , 2009 .

[12]  K. Weigel,et al.  Radial basis function regression methods for predicting quantitative traits using SNP markers. , 2010, Genetics research.

[13]  I Misztal,et al.  Changing definition of productive life in US Holsteins: effect on genetic correlations. , 2005, Journal of dairy science.

[14]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[15]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[16]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[17]  D Gianola,et al.  Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. , 2009, Journal of animal science.

[18]  D. Gianola,et al.  A two-step method for detecting selection signatures using genetic markers , 2010, Proceedings of the ITI 2010, 32nd International Conference on Information Technology Interfaces.

[19]  E. Stone,et al.  The genetics of quantitative traits: challenges and prospects , 2009, Nature Reviews Genetics.

[20]  John D. Lafferty,et al.  Diffusion Kernels on Statistical Manifolds , 2005, J. Mach. Learn. Res..

[21]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[22]  Jean-Philippe Vert,et al.  Graph-Driven Feature Extraction From Microarray Data Using Diffusion Kernels and Kernel CCA , 2002, NIPS.

[23]  François Fouss,et al.  An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification , 2012, Neural Networks.

[24]  C. R. Henderson Applications of linear models in animal breeding , 1984 .

[25]  Aaron J. Lorenz,et al.  Genomic Selection in Plant Breeding , 2011 .

[26]  Huifeng Jiang,et al.  Genetic Architecture of Growth Traits Revealed by Global Epistatic Interactions , 2011, Genome biology and evolution.

[27]  Ismo Strandén,et al.  Allele coding in genomic evaluation , 2011, Genetics Selection Evolution.

[28]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[29]  José Crossa,et al.  Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. , 2010, Genetics research.

[30]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[31]  Daniel Gianola,et al.  Inferring genetic values for quantitative traits non-parametrically. , 2008, Genetics research.

[32]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[33]  K. Weigel,et al.  Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat , 2011, BMC Genetics.

[34]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[35]  John A Woolliams,et al.  A fast algorithm for BayesB type of prediction of genome-wide estimates of genetic value , 2009, Genetics Selection Evolution.

[36]  Kent A Weigel,et al.  Genome-assisted prediction of a quantitative trait measured in parents and progeny: application to food conversion rate in chickens , 2009, Genetics Selection Evolution.

[37]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[38]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[39]  Rohan L. Fernando,et al.  Extension of the bayesian alphabet for genomic selection , 2011, BMC Bioinformatics.

[40]  Annie E. Hill,et al.  Genetic architecture of complex traits: Large phenotypic effects and pervasive epistasis , 2008, Proceedings of the National Academy of Sciences.

[41]  Zhe Zhang,et al.  Advances in genomic selection in domestic animals , 2011 .

[42]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[43]  Daniel Gianola,et al.  Using Whole-Genome Sequence Data to Predict Quantitative Trait Phenotypes in Drosophila melanogaster , 2012, PLoS genetics.

[44]  Daniel Gianola,et al.  Application of support vector regression to genome-assisted prediction of quantitative traits , 2011, Theoretical and Applied Genetics.

[45]  Emilio Porcu,et al.  Predicting Genetic Values: A Kernel-Based Best Linear Unbiased Prediction With Genomic Data , 2011, Genetics.

[46]  José Crossa,et al.  Prediction of Genetic Values of Quantitative Traits in Plant Breeding Using Pedigree and Molecular Markers , 2010, Genetics.

[47]  P. VanRaden,et al.  Efficient methods to compute genomic predictions. , 2008, Journal of dairy science.