Genomic breeding value estimation using genetic markers, inferred ancestral haplotypes, and the genomic relationship matrix.

With the introduction of new single nucleotide polymorphism (SNP) chips of various densities, more and more genotype data sets will include animals genotyped for only a subset of the SNP. Imputation techniques based on unobserved ancestral haplotypes may be used to infer missing genotypes. These ancestral haplotypes may also be used in the genomic prediction model, instead of using the SNP. This may increase the reliability of predictions because the ancestral haplotype may capture more linkage disequilibrium with quantitative trait loci than SNP. The aim of this paper was to study whether using unobserved ancestral haplotypes in a genomic prediction model would provide more reliable genomic predictions than using SNP, and to determine how many loci in the genomic prediction model would be redundant. Genotypes of 8,960 bulls and cows for 39,557 SNP were analyzed with a hidden Markov model to associate each individual at each locus to 2 ancestral haplotypes. The number of ancestral haplotypes per locus was fixed at 10, 15, or 20. Subsequently, a validation study was performed in which the phenotypes of 3,251 progeny-tested bulls for 16 traits were used in a genomic prediction model to predict the estimated breeding values of at least 753 validation bulls. The squared correlation between genomic prediction and deregressed daughter performance estimated breeding value, when averaged across traits, was slightly higher when 15 or 20 ancestral haplotypes per locus were used in the prediction model instead of the SNP genotypes, whereas the prediction model using a genomic relationship matrix gave the lowest squared correlations. The number of redundant loci [i.e., loci that had less than 18 jumps (0.1%) from one ancestral haplotype to another ancestral haplotype at the next locus], was 18,793 (48%), which means that only 20,764 loci would need to be included in the genomic prediction model. This provides opportunities for greatly decreasing computer requirements of genomic evaluations with very large numbers of markers.

[1]  M. Calus,et al.  Accuracy of Genomic Selection Using Different Methods to Define Haplotypes , 2008, Genetics.

[2]  P. VanRaden,et al.  Efficient methods to compute genomic predictions. , 2008, Journal of dairy science.

[3]  Lachlan James M. Coin,et al.  Disease association tests by inferring ancestral haplotypes using a hidden markov model , 2008, Bioinform..

[4]  M. Goddard,et al.  Accuracy of marker-assisted selection with single markers and marker haplotypes in cattle. , 2007, Genetical research.

[5]  Positional candidate cloning of a QTL in dairy cattle: identification of a missense mutation in the bovine DGAT1 gene with major effect on milk yield and composition. , 2002 .

[6]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[7]  M. Goddard,et al.  Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data , 2004, Genetics Selection Evolution.

[8]  Tom Druet,et al.  A Hidden Markov Model Combining Linkage and Linkage Disequilibrium Information for Haplotype Reconstruction and Quantitative Trait Locus Fine Mapping , 2010, Genetics.

[9]  C. Schrooten,et al.  Imputation of genotypes from different single nucleotide polymorphism panels in dairy cattle. , 2010, Journal of dairy science.

[10]  R. Fernando,et al.  The Impact of Genetic Relationship Information on Genome-Assisted Breeding Values , 2007, Genetics.

[11]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[12]  P. VanRaden,et al.  Invited review: reliability of genomic predictions for North American Holstein bulls. , 2009, Journal of dairy science.