Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect

BackgroundThe increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows.ResultsWith the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs.ConclusionsWe present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation.

[1]  Yi-Ping Phoebe Chen,et al.  A computationally efficient algorithm for genomic prediction using a Bayesian model , 2014, Genetics Selection Evolution.

[2]  M P L Calus,et al.  Accuracy of genomic prediction using imputed whole-genome sequence data in white layers. , 2016, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[3]  M. Goddard,et al.  Linkage Disequilibrium and Persistence of Phase in Holstein–Friesian, Jersey and Angus Cattle , 2008, Genetics.

[4]  Michael E Goddard,et al.  The Effects of Demography and Long-Term Selection on the Accuracy of Genomic Prediction with Sequence Data , 2014, Genetics.

[5]  F. Schenkel,et al.  A new approach for efficient genotype imputation using information from relatives , 2014, BMC Genomics.

[6]  I. M. MacLeod,et al.  Exploiting biological priors and sequence variants enhances QTL discovery and genomic prediction of complex traits , 2016, BMC Genomics.

[7]  B. Guldbrandtsen,et al.  Using Sequence Variants in Linkage Disequilibrium with Causative Mutations to Improve Across-Breed Prediction in Dairy Cattle: A Simulation Study , 2016, G3: Genes, Genomes, Genetics.

[8]  B. Guldbrandtsen,et al.  Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction. , 2015, Journal of dairy science.

[9]  B. Guldbrandtsen,et al.  Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle , 2014, BMC Genomics.

[10]  Chris Schrooten,et al.  Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein–Friesian cattle , 2016, Genetics Selection Evolution.

[11]  C. Schrooten,et al.  Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection , 2016, Genetics Selection Evolution.

[12]  R. Veerkamp,et al.  Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle , 2014, Nature Genetics.

[13]  Paul Stothard,et al.  In-depth annotation of SNPs arising from resequencing projects using NGS-SNP , 2011, Bioinform..

[14]  Timothy P. L. Smith,et al.  Development and Characterization of a High Density SNP Genotyping Assay for Cattle , 2009, PloS one.

[15]  Michael E Goddard,et al.  Improved precision of QTL mapping using a nonlinear Bayesian method in a multi-breed population leads to greater accuracy of across-breed genomic predictions , 2014, Genetics Selection Evolution.

[16]  P. Ma,et al.  Review: How to improve genomic predictions in small dairy cattle populations. , 2016, Animal : an international journal of animal bioscience.

[17]  R. Fernando,et al.  Deregressing estimated breeding values and weighting information for genomic regression analyses , 2009, Genetics Selection Evolution.

[18]  M. Lund,et al.  Sequence variants selected from a multi-breed GWAS can improve the reliability of genomic predictions in dairy cattle , 2016, Genetics Selection Evolution.

[19]  H. Simianer,et al.  Whole-genome sequence-based genomic prediction in laying chickens with different genomic relationship matrices to account for genetic architecture , 2017, Genetics Selection Evolution.

[20]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[21]  Henner Simianer,et al.  Accounting for Genetic Architecture Improves Sequence Based Genomic Prediction for a Drosophila Fitness Trait , 2015, PloS one.

[22]  M. Goddard,et al.  A hybrid expectation maximisation and MCMC sampling algorithm to implement Bayesian mixture model based genomic prediction and QTL mapping , 2016, BMC Genomics.

[23]  M. Pérez-Enciso,et al.  Sequence- vs. chip-assisted genomic selection: accurate biological information is advised , 2015, Genetics Selection Evolution.

[24]  Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle , 2015, Genetics Selection Evolution.