Enhanced mixture interpretation with macrohaplotypes based on long-read DNA sequencing

Deconvoluting mixture samples is one of the most challenging problems confronting DNA forensic laboratories. Efforts have been made to provide solutions regarding mixture interpretation. The probabilistic interpretation of Short Tandem Repeat (STR) profiles has increased the number of complex mixtures that can be analyzed. A portion of complex mixture profiles, particularly for mixtures with a high number of contributors, are still being deemed uninterpretable. Novel forensic markers, such as Single Nucleotide Variants (SNV) and microhaplotypes, also have been proposed to allow for better mixture interpretation. However, these markers have both a lower discrimination power compared with STRs and are not compatible with CODIS or other national DNA databanks worldwide. The short-read sequencing (SRS) technologies can facilitate mixture interpretation by identifying intra-allelic variations within STRs. Unfortunately, the short size of the amplicons containing STR markers and sequence reads limit the alleles that can be attained per STR. The latest long-read sequencing (LRS) technologies can overcome this limitation in some samples in which larger DNA fragments (including both STRs and SNVs) with definitive phasing are available. Based on the LRS technologies, this study developed a novel CODIS compatible forensic marker, called a macrohaplotype, which combines a CODIS STR and flanking variants to offer extremely high number of haplotypes and hence very high discrimination power per marker. The macrohaplotype will substantially improve mixture interpretation capabilities. Based on publicly accessible data, a panel of 20 macrohaplotypes with sizes of ~ 8 k bp and the maximum high discrimination powers were designed. The statistical evaluation demonstrates that these macrohaplotypes substantially outperform CODIS STRs for mixture interpretation, particularly for mixtures with a high number of contributors, as well as other forensic applications. Based on these results, efforts should be undertaken to build a complete workflow, both wet-lab and bioinformatics, to precisely call the variants and generate the macrohaplotypes based on the LRS technologies.

[1]  W Parson,et al.  "The devil's in the detail": Release of an expanded, enhanced and dynamically revised forensic STR Sequence Guide. , 2018, Forensic science international. Genetics.

[2]  SallyAnn Harbison,et al.  A review of the potential of the MinION™ single‐molecule sequencing system for forensic applications , 2018, WIREs Forensic Science.

[3]  C. Baird,et al.  The pilot study. , 2000, Orthopedic nursing.

[4]  Swee Lay Thein,et al.  Hypervariable ‘minisatellite’ regions in human DNA , 1985, Nature.

[5]  K. Kidd,et al.  Criteria for selecting microhaplotypes: mixture detection and deconvolution , 2015, Investigative Genetics.

[6]  K. Kidd,et al.  Evaluating 130 microhaplotypes across a global set of 83 populations. , 2017, Forensic science international. Genetics.

[7]  K. Chiu,et al.  Long-read sequencing in deciphering human genetics to a greater depth , 2019, Human Genetics.

[8]  W R Mayr,et al.  DNA commission of the International Society of Forensic Genetics: Recommendations on the interpretation of mixtures. , 2006, Forensic science international.

[9]  M. Gymrek,et al.  A reference haplotype panel for genome-wide imputation of short tandem repeats , 2018, Nature Communications.

[10]  David J. Werrett,et al.  Forensic application of DNA ‘fingerprints’ , 1985, Nature.

[11]  L. Excoffier,et al.  Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows , 2010, Molecular ecology resources.

[12]  R. Chakraborty,et al.  Haplotype block: a new type of forensic DNA markers , 2009, International Journal of Legal Medicine.

[13]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[14]  Øyvind Bleka,et al.  EuroForMix: An open source software based on a continuous model to evaluate STR DNA profiles from a mixture of contributors with artefacts. , 2016, Forensic science international. Genetics.

[15]  B. Weir,et al.  Analyzing population structure for forensic STR markers in next generation sequencing data. , 2020, Forensic science international. Genetics.

[16]  M. Perlin,et al.  Validating TrueAllele® DNA Mixture Interpretation * ,† , 2011, Journal of forensic sciences.

[17]  Brian L Browning,et al.  A One-Penny Imputed Genome from Next-Generation Reference Panels. , 2018, American journal of human genetics.

[18]  David Heckerman,et al.  Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes , 2017, American journal of human genetics.

[19]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[20]  T. Egeland,et al.  Complex mixtures: a critical examination of a paper by Homer et al. , 2012, Forensic science international. Genetics.

[21]  Dieter Deforce,et al.  Forensic STR profiling using Oxford Nanopore Technologies’ MinION sequencer , 2018 .

[22]  Niels Morling,et al.  Next generation sequencing and its applications in forensic genetics. , 2015, Forensic science international. Genetics.

[23]  V. Castella,et al.  DIP–STR: Highly Sensitive Markers for the Analysis of Unbalanced Genomic Mixtures , 2013, Human mutation.

[24]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[25]  R. Knight,et al.  High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing , 2021, Nature Methods.

[26]  I. Balazs,et al.  Application of deoxyribonucleic acid (DNA) polymorphisms to the analysis of DNA recovered from sperm. , 1986, Journal of forensic sciences.

[27]  Yaniv Erlich,et al.  Genome-wide profiling of heritable and de novo STR variations , 2016, Nature Methods.

[28]  Jerome P Ferrance,et al.  Enhanced Elution of Sperm from Cotton Swabs Via Enzymatic Digestion for Rape Kit Analysis * , 2006, Journal of forensic sciences.

[29]  T. Imanishi,et al.  A portable system for rapid bacterial composition analysis using a nanopore-based sequencer and laptop computer , 2017, Scientific Reports.

[30]  Zhi-zhen Liu,et al.  A set of 14 DIP-SNP markers to detect unbalanced DNA mixtures. , 2018, Biochemical and biophysical research communications.

[31]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[32]  A. Darvasi,et al.  Forensic identification of an individual in complex DNA mixtures. , 2011, Forensic science international. Genetics.

[33]  J. Buckleton,et al.  An evaluation of potential allelic association between the STRs vWA and D12S391: implications in criminal casework and applications to short pedigrees. , 2012, Forensic science international. Genetics.

[34]  M. Vandewoestyne,et al.  Evaluation of three DNA extraction protocols for forensic STR typing after laser capture microdissection. , 2012, Forensic science international. Genetics.

[35]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[36]  M. Asogawa,et al.  Human short tandem repeat identification using a nanopore-based DNA sequencer: a pilot study , 2019, Journal of Human Genetics.

[37]  M P Epstein,et al.  Improved inference of relationship for pairs of individuals. , 2000, American journal of human genetics.

[38]  Brian E. Cade,et al.  Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program , 2019, Nature.

[39]  D. Deforce,et al.  Forensic STR analysis using massive parallel sequencing. , 2012, Forensic science international. Genetics.

[40]  Peter Gill,et al.  The open-source software LRmix can be used to analyse SNP mixtures , 2015 .

[41]  Michael R. Lindberg,et al.  A Comparison and Integration of MiSeq and MinION Platforms for Sequencing Single Source and Mixed Mitochondrial Genomes , 2016, PloS one.

[42]  Duncan Taylor,et al.  Developmental validation of STRmix™, expert software for the interpretation of forensic DNA profiles. , 2016, Forensic science international. Genetics.

[43]  I. Safarik,et al.  Use of magnetic techniques for the isolation of cells. , 1999, Journal of chromatography. B, Biomedical sciences and applications.

[44]  D. Deforce,et al.  Nanopore Sequencing of a Forensic STR Multiplex Reveals Loci Suitable for Single-Contributor STR Profiling , 2020, Genes.

[45]  Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program , 2021, Nature.

[46]  Bruce Budowle,et al.  Forensically relevant SNP classes. , 2008, BioTechniques.

[47]  G. Pazour,et al.  Ror2 signaling regulates Golgi structure and transport through IFT20 for tumor invasiveness , 2017, Scientific Reports.

[48]  Yaniv Erlich,et al.  Rapid re-identification of human samples using portable DNA sequencing , 2017, eLife.

[49]  Bruce Budowle,et al.  Characterization of genetic sequence variation of 58 STR loci in four major population groups. , 2016, Forensic science international. Genetics.

[50]  Ralf Bundschuh,et al.  Short-read, high-throughput sequencing technology for STR genotyping. , 2012, BioTechniques. Rapid dispatches.

[51]  Pedro M. Valero-Mora,et al.  ggplot2: Elegant Graphics for Data Analysis , 2010 .

[52]  Aaron K. LeFebvre,et al.  SNP-microarrays can accurately identify the presence of an individual in complex forensic DNA mixtures. , 2015, Forensic science international. Genetics.