A large-scale dataset of single and mixed-source short tandem repeat profiles to inform human identification strategies: PROVEDIt.

DNA-based human identity testing is conducted by comparison of PCR-amplified polymorphic Short Tandem Repeat (STR) motifs from a known source with the STR profiles obtained from uncertain sources. Samples such as those found at crime scenes often result in signal that is a composite of incomplete STR profiles from an unknown number of unknown contributors, making interpretation an arduous task. To facilitate advancement in STR interpretation challenges we provide over 25,000 multiplex STR profiles produced from one to five known individuals at target levels ranging from one to 160 copies of DNA. The data, generated under 144 laboratory conditions, are classified by total copy number and contributor proportions. For the 70% of samples that were synthetically compromised, we report the level of DNA damage using quantitative and end-point PCR. In addition, we characterize the complexity of the signal by exploring the number of detected alleles in each profile.

[1]  Duncan Taylor,et al.  Interpreting forensic DNA profiling evidence without specifying the number of contributors. , 2014, Forensic science international. Genetics.

[2]  Peter Gill,et al.  Genotyping and interpretation of STR-DNA: Low-template, mixtures and database matches-Twenty years of research and development. , 2015, Forensic science international. Genetics.

[3]  J. Mortera,et al.  Analysis of forensic DNA mixtures with artefacts , 2013, 1302.4404.

[4]  D. Balding Evaluation of mixed-source, low-template DNA profiles in forensic science , 2013, Proceedings of the National Academy of Sciences.

[5]  John M. Butler,et al.  Forensic DNA typing : biology & technology behind STR markers , 2001 .

[6]  Peter Gill,et al.  Validation of probabilistic genotyping software for use in forensic DNA casework: Definitions and illustrations. , 2016, Science & justice : journal of the Forensic Science Society.

[7]  D. Balding,et al.  Evaluating forensic DNA profiles using peak heights, allowing for multiple donors, allelic dropout and stutters. , 2013, Forensic science international. Genetics.

[8]  I. Dror,et al.  Subjectivity and bias in forensic DNA mixture interpretation. , 2011, Science & justice : journal of the Forensic Science Society.

[9]  Klaas Slooten Accurate assessment of the weight of evidence for DNA mixtures by integrating the likelihood ratio. , 2017, Forensic science international. Genetics.

[10]  B. McCord,et al.  A comparison of the effects of PCR inhibition in quantitative PCR and forensic STR analysis , 2011, Electrophoresis.

[11]  Muriel Medard,et al.  CEESIt: A computational tool for the interpretation of STR mixtures. , 2016, Forensic science international. Genetics.

[12]  Øyvind Bleka,et al.  A comparative study of qualitative and quantitative models used to interpret complex STR DNA profiles. , 2016, Forensic science international. Genetics.

[13]  Lior Pachter,et al.  Single-cell analysis at the threshold , 2016, Nature Biotechnology.

[14]  Li C. Xia,et al.  CRISPR–Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis , 2017, Nature Communications.

[15]  Mark W Perlin,et al.  TrueAllele® Genotype Identification on DNA Mixtures Containing up to Five Unknown Contributors , 2015, Journal of forensic sciences.

[16]  T. Egeland,et al.  Characterization of degradation and heterozygote balance by simulation of the forensic DNA analysis process , 2016, International Journal of Legal Medicine.

[17]  M W Perlin,et al.  Linear mixture analysis: a mathematical approach to resolving mixed DNA samples. , 2001, Journal of forensic sciences.

[18]  Duncan Taylor,et al.  The interpretation of single source and mixed DNA profiles. , 2013, Forensic science international. Genetics.

[19]  Ullrich J. Mönich,et al.  Probabilistic characterisation of baseline noise in STR profiles. , 2015, Forensic science international. Genetics.

[20]  Mark W. Perlin,et al.  An Information Gap in DNA Evidence Interpretation , 2009, PloS one.

[21]  Duncan Taylor,et al.  Developing allelic and stutter peak height models for a continuous method of DNA interpretation. , 2013, Forensic science international. Genetics.

[22]  Catherine M. Grgicak,et al.  Exploring STR signal in the single‐ and multicopy number regimes: Deductions from an in silico model of the entire DNA laboratory process , 2017, Electrophoresis.

[23]  Duncan Taylor,et al.  Teaching artificial intelligence to read electropherograms. , 2016, Forensic science international. Genetics.

[24]  G. Vivó-Truyols,et al.  A New Bayesian Approach for Estimating the Presence of a Suspected Compound in Routine Screening Analysis. , 2016, Analytical chemistry.

[25]  P. Walsh,et al.  Sequence analysis and characterization of stutter products at the tetranucleotide repeat locus vWA. , 1996, Nucleic acids research.

[26]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[27]  Silvia Bozza Discussion on the paper by Cowell, Graversen, Lauritzen and Mortera (Analysis of forensic DNA mixtures with artefacts) , 2015 .

[28]  N. Kaur,et al.  Pedigree-based relationship inference from complex DNA mixtures , 2017, International Journal of Legal Medicine.

[29]  Duncan Taylor,et al.  Developmental validation of STRmix™, expert software for the interpretation of forensic DNA profiles. , 2016, Forensic science international. Genetics.

[30]  Susan Pope,et al.  Is it to the advantage of a defendant to infer a greater number of contributors to a questioned sample than is necessary to explain the observed DNA profile? , 2014, Science & justice : journal of the Forensic Science Society.

[31]  John Buckleton,et al.  Interpreting low template DNA profiles. , 2009, Forensic science international. Genetics.

[32]  M. Perlin,et al.  Validating TrueAllele® DNA Mixture Interpretation * ,† , 2011, Journal of forensic sciences.

[33]  G. Vivó-Truyols,et al.  Probabilistic Model for Untargeted Peak Detection in LC-MS Using Bayesian Statistics. , 2015, Analytical chemistry.

[34]  J Buckleton,et al.  DNA Commission of the International Society for Forensic Genetics: Recommendations on the validation of software programs performing biostatistical calculations for forensic genetics applications. , 2016, Forensic science international. Genetics.

[35]  Duncan Taylor,et al.  Do low template DNA profiles have useful quantitative data? , 2015, Forensic science international. Genetics.

[36]  Michael D. Coble,et al.  Evaluation of forensic DNA mixture evidence: protocol for evaluation, interpretation, and statistical calculations using the combined probability of inclusion , 2016, BMC Genetics.

[37]  Jo-Anne Bright,et al.  Comparison of the performance of different models for the interpretation of low level mixed DNA profiles , 2014, Electrophoresis.

[38]  Muriel Medard,et al.  NOCIt: a computational method to infer the number of contributors to DNA samples analyzed by STR genotyping. , 2015, Forensic science international. Genetics.

[39]  Thore Egeland,et al.  About the number of contributors to a forensic sample. , 2016, Forensic science international. Genetics.

[40]  Torben Tvedebrink On the exact distribution of the numbers of alleles in DNA mixtures , 2013, International Journal of Legal Medicine.

[41]  Lisa Calandro,et al.  Quantifiler® Trio Kit and forensic samples management: a matter of degradation. , 2015, Forensic science international. Genetics.

[42]  Reza Alaeddini Forensic implications of PCR inhibition--A review. , 2012, Forensic science international. Genetics.

[43]  William R Hudlow,et al.  A quadruplex real-time qPCR assay for the simultaneous assessment of total human DNA, human male DNA, DNA degradation and the presence of PCR inhibitors in forensic samples: a diagnostic tool for STR typing. , 2008, Forensic science international. Genetics.

[44]  K. Alimoghaddam,et al.  The Relationship between STR-PCR Chimerism Analysis and Chronic GvHD Following Hematopoietic Stem Cell Transplantation , 2017, International journal of hematology-oncology and stem cell research.

[45]  Keith Inman,et al.  Run‐Specific Limits of Detection and Quantitation for STR‐based DNA Testing , 2007, Journal of forensic sciences.

[46]  John M. Butler,et al.  Forensic DNA Typing: Biology, Technology, and Genetics of STR Markers , 2001 .

[47]  D. Balding,et al.  Verifying likelihoods for low template DNA profiles using multiple replicates , 2014, Forensic science international. Genetics.

[48]  N. Yonezawa,et al.  Estimation of the detection rate in STR analysis by determining the DNA degradation ratio using quantitative PCR. , 2013, Legal medicine.

[49]  Catherine M Grgicak,et al.  Maximizing allele detection: Effects of analytical threshold and DNA levels on rates of allele and locus drop-out. , 2012, Forensic science international. Genetics.

[50]  Bruce Budowle,et al.  DNA Identifications After the 9/11 World Trade Center Attack , 2005, Science.

[51]  Lilliana I Moreno,et al.  Performance and concordance of the ForenSeq™ system for autosomal and Y chromosome short tandem repeat sequencing of reference-type specimens. , 2017, Forensic science international. Genetics.

[52]  P Gill,et al.  A new methodological framework to interpret complex DNA profiles using likelihood ratios. , 2013, Forensic science international. Genetics.

[53]  Norah Rudin,et al.  Quality Assurance Standards for Forensic DNA Testing Laboratories , 2001 .

[54]  T. Egeland,et al.  The likelihood ratio as a random variable for linked markers in kinship analysis , 2016, International Journal of Legal Medicine.

[55]  James Curran,et al.  A discussion of the merits of random man not excluded and likelihood ratios. , 2008, Forensic science international. Genetics.

[56]  Charles H Brenner Fairness in evaluating DNA mixtures. , 2017, Forensic science international. Genetics.

[57]  Adele A. Mitchell,et al.  Estimating the number of contributors to two-, three-, and four-person mixtures containing DNA in high template and low template amounts , 2011, Croatian medical journal.

[58]  Jo-Anne Bright,et al.  Characterising stutter in forensic STR multiplexes. , 2012, Forensic science international. Genetics.

[59]  Titia Sijen,et al.  Low template STR typing: effect of replicate number and consensus method on genotyping reliability and DNA database search results. , 2011, Forensic science international. Genetics.

[60]  Catherine M. Grgicak,et al.  Investigation of Reproducibility and Error Associated with qPCR Methods using Quantifiler® Duo DNA Quantification Kit * , 2010, Journal of forensic sciences.

[61]  Luísa Pereira,et al.  Human Neutral Genetic Variation and Forensic STR Data , 2012, PloS one.

[62]  Michael A. Marciano,et al.  PACE: Probabilistic Assessment for Contributor Estimation- A machine learning-based assessment of the number of contributors in DNA mixtures. , 2017, Forensic science international. Genetics.

[63]  Jo-Anne Bright,et al.  Variability of mixed DNA profiles separated on a 3130 and 3500 capillary electrophoresis instrument , 2014 .

[64]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[65]  Ali Abbas,et al.  Forensic implications of genetic analyses from degraded DNA--a review. , 2010, Forensic science international. Genetics.

[66]  T. Sijen,et al.  The effect of varying the number of contributors on likelihood ratios for complex DNA mixtures. , 2015, Forensic science international. Genetics.

[67]  G. Lucena-Aguilar,et al.  Cell lines authentication and mycoplasma detection as minimun quality control of cell lines in biobanking , 2017, Cell and Tissue Banking.