Overdispersion in allelic counts and θ-correction in forensic genetics.

We present a statistical model for incorporating the extra variability in allelic counts due to subpopulation structures. In forensic genetics, this effect is modelled by the identical-by-descent parameter θ, which measures the relationship between pairs of alleles within a population relative to the relationship of alleles between populations (Weir, 2007). In our statistical approach, we demonstrate that θ may be defined as an overdispersion parameter capturing the subpopulation effects. This formulation allows derivation of maximum likelihood estimates of the allele probabilities and θ together with computation of the profile log-likelihood, confidence intervals and hypothesis testing. In order to compare our method with existing methods, we reanalysed FBI data from Budowle and Moretti (1999) with allele counts in six US subpopulations. Furthermore, we investigate properties of our methodology from simulation studies.

[1]  B S Weir,et al.  Estimating F-statistics. , 2002, Annual review of genetics.

[2]  Torben Tvedebrink Overdispersion in allelic counts and θ -correction in forensic genetics , 2009 .

[3]  Kent E. Holsinger,et al.  Analysis of Genetic Diversity in Geographically Structured Populations: A Bayesian Perspective , 2004 .

[4]  Bruce Budowle,et al.  Genotype Profiles for Six Population Groups at the 13 CODIS Short Tandem Repeat Core Loci and Other PCRBbased Loci , 1999 .

[5]  Á. Carracedo,et al.  Analysis of global variability in 15 established and 5 new European Standard Set (ESS) STRs using the CEPH human genome diversity panel. , 2011, Forensic science international. Genetics.

[6]  David J Balding,et al.  Effects of population structure on DNA fingerprint analysis in forensic science , 1991, Heredity.

[7]  T. Banerjee,et al.  Fisher Information Matrix of the Dirichlet‐multinomial Distribution , 2005, Biometrical journal. Biometrische Zeitschrift.

[8]  J M Curran,et al.  Assessing uncertainty in DNA evidence caused by sampling effects. , 2002, Science & justice : journal of the Forensic Science Society.

[9]  K. Lange,et al.  MM Algorithms for Some Discrete Multivariate Distributions , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[10]  Bruce S Weir,et al.  THE RARITY OF DNA PROFILES. , 2007, The annals of applied statistics.

[11]  C. Field,et al.  Bootstrapping clustered data , 2007 .

[12]  D. Balding,et al.  Significant genetic correlations among Caucasians at forensic DNA loci , 1997, Heredity.

[13]  B Rannala,et al.  Estimating gene flow in island populations. , 1996, Genetical research.

[14]  Kenneth Lange,et al.  Mathematical and Statistical Methods for Genetic Analysis , 1997 .

[15]  Kenneth Lange,et al.  Applications of the Dirichlet distribution to forensic match probabilities , 2005, Genetica.

[16]  I. Evett,et al.  Interpreting DNA Evidence: Statistical Genetics for Forensic Scientists , 1998 .

[17]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[18]  B. Weir,et al.  Drawing inferences about the coancestry coefficient. , 2009, Theoretical population biology.

[19]  K. Holsinger,et al.  Genetics in geographically structured populations: defining, estimating and interpreting FST , 2009, Nature Reviews Genetics.

[20]  A. Davison,et al.  Non‐parametric bootstrap confidence intervals for the intraclass correlation coefficient , 2003, Statistics in medicine.

[21]  J M Curran,et al.  Interpreting DNA mixtures in structured populations. , 1999, Journal of forensic sciences.

[22]  N. L. Johnson,et al.  Discrete Multivariate Distributions , 1998 .

[23]  J. Mortera,et al.  Sensitivity of inferences in forensic genetics to assumptions about founding genes , 2009, 0908.2862.

[24]  Richard A. Nichols,et al.  A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity , 2008, Genetica.

[25]  B Budowle,et al.  Population data on the thirteen CODIS core short tandem repeat loci in African Americans, U.S. Caucasians, Hispanics, Bahamians, Jamaicans, and Trinidadians. , 1999, Journal of forensic sciences.

[26]  D. Balding Weight-of-Evidence for Forensic DNA Profiles , 2005 .

[27]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[28]  B. Weir Genetic Data Analysis II. , 1997 .

[29]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[30]  T. Tvedebrink,et al.  Evaluating the weight of evidence by using quantitative short tandem repeat data in DNA mixtures , 2010 .

[31]  Nagaraj K. Neerchal,et al.  An improved method for the computation of maximum likeliood estimates for multinomial overdispersion models , 2005, Comput. Stat. Data Anal..

[32]  J. Mosimann On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions , 1962 .

[33]  D. Balding Likelihood-based inference for genetic correlation coefficients. , 2003, Theoretical population biology.