Estimation of genomic prediction accuracy from reference populations with varying degrees of relationship

Genomic prediction is emerging in a wide range of fields including animal and plant breeding, risk prediction in human precision medicine and forensic. It is desirable to establish a theoretical framework for genomic prediction accuracy when the reference data consists of information sources with varying degrees of relationship to the target individuals. A reference set can contain both close and distant relatives as well as ‘unrelated’ individuals from the wider population in the genomic prediction. The various sources of information were modeled as different populations with different effective population sizes (Ne). Both the effective number of chromosome segments (Me) and Ne are considered to be a function of the data used for prediction. We validate our theory with analyses of simulated as well as real data, and illustrate that the variation in genomic relationships with the target is a predictor of the information content of the reference set. With a similar amount of data available for each source, we show that close relatives can have a substantially larger effect on genomic prediction accuracy than lesser related individuals. We also illustrate that when prediction relies on closer relatives, there is less improvement in prediction accuracy with an increase in training data or marker panel density. We release software that can estimate the expected prediction accuracy and power when combining different reference sources with various degrees of relationship to the target, which is useful when planning genomic prediction (before or after collecting data) in animal, plant and human genetics.

[1]  R. Fernando,et al.  Genomic BLUP Decoded: A Look into the Black Box of Genomic Prediction , 2013, Genetics.

[2]  Shein-Chung Chow,et al.  Sample Size Calculations in Clinical Research, Second Edition , 2003 .

[3]  Peter M Visscher,et al.  Prediction of individual genetic risk to disease from genome-wide association studies. , 2007, Genome research.

[4]  Sang Hong Lee,et al.  Predicting Unobserved Phenotypes for Complex Traits from Whole-Genome SNP Data , 2008, PLoS genetics.

[5]  S. Lee,et al.  The efficiency of designs for fine-mapping of quantitative trait loci using combined linkage disequilibrium and linkage , 2004, Genetics Selection Evolution.

[6]  Jennifer R. Harris,et al.  Heritability of Adult Body Height: A Comparative Study of Twin Cohorts in Eight Countries , 2003, Twin Research.

[7]  B. Berger,et al.  Two variance component model improves genetic prediction in family data sets , 2015, bioRxiv.

[8]  Hans D. Daetwyler,et al.  Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach , 2008, PloS one.

[9]  L. Perlemuter [From theory to practice]. , 1997, Soins. Psychiatrie.

[10]  P. Visscher,et al.  Increased accuracy of artificial selection by using the realized relationship matrix. , 2009, Genetics research.

[11]  P. Visscher,et al.  Bias, precision and heritability of self-reported and clinically measured height in Australian twins , 2006, Human Genetics.

[12]  M. Calus,et al.  An Equation to Predict the Accuracy of Genomic Values by Combining Data from Multiple Traits, Populations, or Environments , 2015, Genetics.

[13]  M. Goddard,et al.  Accelerating improvement of livestock with genomic selection. , 2013, Annual review of animal biosciences.

[14]  D. Allison,et al.  Beyond Missing Heritability: Prediction of Complex Traits , 2011, PLoS genetics.

[15]  Ignacy Misztal,et al.  Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information , 2011, Genetics Selection Evolution.

[16]  M. Calus,et al.  Reliability of direct genomic values for animals with different relationships within and to the reference population. , 2012, Journal of dairy science.

[17]  M. Goddard,et al.  Using the genomic relationship matrix to predict the accuracy of genomic selection. , 2011, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[18]  N. Wray,et al.  Research review: Polygenic methods and their application to psychiatric traits. , 2014, Journal of child psychology and psychiatry, and allied disciplines.

[19]  P. Ma,et al.  Comparison of genomic predictions using medium-density (∼54,000) and high-density (∼777,000) single nucleotide polymorphism marker panels in Nordic Holstein and Red Dairy Cattle populations. , 2012, Journal of dairy science.

[20]  M. Goddard,et al.  Mapping genes for complex traits in domestic animals and their use in breeding programmes , 2009, Nature Reviews Genetics.

[21]  J. Sved Linkage disequilibrium and homozygosity of chromosome segments in finite populations. , 1971, Theoretical population biology.

[22]  Jean-Luc Jannink,et al.  Genomic selection in plant breeding: from theory to practice. , 2010, Briefings in functional genomics.

[23]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[24]  Naomi R. Wray,et al.  Novel Genetic Analysis for Case-Control Genome-Wide Association Studies: Quantification of Power and Genomic Prediction Accuracy , 2013, PloS one.

[25]  Oliver A. Ryder,et al.  Pedigree analysis by computer simulation , 1986 .

[26]  A. Hofman,et al.  Predicting human height by Victorian and genomic methods , 2009, European Journal of Human Genetics.

[27]  P. VanRaden,et al.  Genomic evaluations with many more genotypes , 2011, Genetics Selection Evolution.

[28]  H. Daetwyler,et al.  The importance of information on relatives for the prediction of genomic breeding values and the implications for the makeup of reference data sets in livestock breeding schemes , 2012, Genetics Selection Evolution.

[29]  P. Visscher,et al.  Estimating missing heritability for disease from genome-wide association studies. , 2011, American journal of human genetics.

[30]  R. Fernando,et al.  Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor , 2013, PLoS genetics.

[31]  P. Bacchetti,et al.  Sample size calculations in clinical research. , 2002, Anesthesiology.

[32]  Stephen J. Sharp,et al.  Variability in the Heritability of Body Mass Index: A Systematic Review and Meta-Regression , 2012, Front. Endocrin..

[33]  M. Goddard Genomic selection: prediction of accuracy and maximisation of long term response , 2009, Genetica.

[34]  P. Shannon,et al.  Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing , 2010, Science.

[35]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[36]  Andrés Legarra,et al.  Performance of Genomic Selection in Mice , 2008, Genetics.

[37]  Qiong Yang,et al.  The Third Generation Cohort of the National Heart, Lung, and Blood Institute's Framingham Heart Study: design, recruitment, and initial examination. , 2007, American journal of epidemiology.

[38]  Lynette Ekunwe,et al.  Study design for genetic analysis in the Jackson Heart Study. , 2005, Ethnicity & disease.

[39]  M. Calus,et al.  The Effect of Linkage Disequilibrium and Family Relationships on the Reliability of Genomic Prediction , 2013, Genetics.

[40]  B. Mangin,et al.  On the Accuracy of Genomic Selection , 2016, PloS one.

[41]  Manuel A. R. Ferreira,et al.  Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings , 2006, PLoS genetics.

[42]  Naomi R. Wray,et al.  Using information of relatives in genomic prediction to apply effective stratified medicine , 2017, Scientific Reports.