Global patterns of STR sequence variation: Sequencing the CEPH human genome diversity panel for 58 forensic STRs using the Illumina ForenSeq DNA Signature Prep Kit

The 944 individuals of the CEPH human genome diversity panel (HGDP–CEPH), a standard sample set of 51 globally distributed populations, were sequenced using the Illumina ForenSeq™ DNA Signature Prep Kit. The ForenSeq™ system is a single multiplex for the MiSeq/FGx™ massively parallel sequencing instrument, comprising: amelogenin, 27 autosomal STRs, 24 Y‐STRs, 7 X‐STRs, and 94 SNPforID+Kiddlab autosomal ID‐SNPs (plus optionally detected ancestry and phenotyping SNP sets). We report in detail the patterns of sequence variation observed in the repeat regions of the 58 forensic STR loci typed by the ForenSeq™ system. Sequence alleles were characterized and repeat region structures annotated by aligning the ForenSeq™ sequence output to the latest GRCh38 human reference sequence, necessitating the reversal and re‐alignment of STR allele sequences reported by the Forenseq™ system in 20 of 58 STRs (plus the reverse alleles in two Y‐STRs with duplicated‐inverted repeat regions). Individual population sample sizes of the HGDP–CEPH panel do not allow reliable inferences to be made about levels of genetic variability in low frequency STR alleles‐where particular sequence variants are found in only a few individuals; but we assessed the occurrence of both population‐specific sequence variants and singleton observations; finding each of these in a sizeable proportion of HGDP–CEPH samples, with consequences for planning the co‐ordinated compilation of sequence variation on a much larger scale than was required before by forensic laboratories now adopting massively parallel sequencing.

[1]  Á. Carracedo,et al.  Analysis of global variability in 15 established and 5 new European Standard Set (ESS) STRs using the CEPH human genome diversity panel. , 2011, Forensic science international. Genetics.

[2]  Bridget F B Algee-Hewitt,et al.  Linkage disequilibrium matches forensic genetic records to disjoint genomic marker sets , 2017, Proceedings of the National Academy of Sciences.

[3]  W Parson,et al.  "The devil's in the detail": Release of an expanded, enhanced and dynamically revised forensic STR Sequence Guide. , 2018, Forensic science international. Genetics.

[4]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[5]  Á. Carracedo,et al.  “New turns from old STaRs”: Enhancing the capabilities of forensic short tandem repeat analysis , 2014, Electrophoresis.

[6]  Charles H Brenner,et al.  Understanding Y haplotype matching probability. , 2014, Forensic science international. Genetics.

[7]  Bruce Budowle,et al.  Massively parallel sequencing of forensic STRs: Considerations of the DNA commission of the International Society for Forensic Genetics (ISFG) on minimal nomenclature requirements. , 2016, Forensic science international. Genetics.

[8]  Bruce Budowle,et al.  Evaluation of the Illumina(®) Beta Version ForenSeq™ DNA Signature Prep Kit for use in genetic profiling. , 2016, Forensic science international. Genetics.

[9]  Bruce Budowle,et al.  Characterization of genetic sequence variation of 58 STR loci in four major population groups. , 2016, Forensic science international. Genetics.

[10]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[11]  Peter M Vallone,et al.  Sequence variation of 22 autosomal STR loci detected by next generation sequencing. , 2016, Forensic science international. Genetics.

[12]  Peter M Vallone,et al.  STR allele sequence variation: Current knowledge and future issues. , 2015, Forensic science international. Genetics.

[13]  David Ballard,et al.  Concordance of the ForenSeq™ system and characterisation of sequence-specific autosomal STR alleles across two major population groups. , 2017, Forensic science international. Genetics.

[14]  Jocelyne Bruand,et al.  Developmental validation of the MiSeq FGx Forensic Genomics System for Targeted Next Generation Sequencing in Forensic DNA Casework and Database Laboratories. , 2017, Forensic science international. Genetics.

[15]  Douglas R Storts,et al.  Massively parallel sequencing of short tandem repeats-Population data and mixture analysis results for the PowerSeq™ system. , 2016, Forensic science international. Genetics.

[16]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[17]  D. Deforce,et al.  Forensic STR analysis using massive parallel sequencing. , 2012, Forensic science international. Genetics.

[18]  M. Jakobsson,et al.  Clumpak: a program for identifying clustering modes and packaging population structure inferences across K , 2015, Molecular ecology resources.

[19]  Rebecca Just,et al.  Short tandem repeat typing on the 454 platform: strategies and considerations for targeted sequencing of common forensic markers. , 2014, Forensic science international. Genetics.

[20]  C. Tyler-Smith,et al.  A Worldwide Survey of Human Male Demographic History Based on Y-SNP and Y-STR Data from the HGDP–CEPH Populations , 2009, Molecular biology and evolution.

[21]  C Phillips,et al.  A genomic audit of newly-adopted autosomal STRs for forensic identification. , 2017, Forensic science international. Genetics.

[22]  M. Feldman,et al.  Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation , 2008 .

[23]  Bruce Budowle,et al.  STRSeq: A catalog of sequence diversity at human identification Short Tandem Repeat loci. , 2017, Forensic science international. Genetics.

[24]  J. Butler,et al.  A 26plex Autosomal STR Assay to Aid Human Identity Testing * † , 2009, Journal of forensic sciences.

[25]  N. Rosenberg,et al.  Standardized Subsets of the HGDP‐CEPH Human Genome Diversity Cell Line Panel, Accounting for Atypical and Duplicated Samples and Pairs of Close Relatives , 2006, Annals of human genetics.

[26]  W. E. Frank,et al.  A global analysis of Y-chromosomal haplotype diversity for 23 STR loci , 2014, Forensic science international. Genetics.