An examination of STR nomenclatures, filters and models for MPS mixture interpretation.

The increased interest in the use of Massively Parallel Sequencing (MPS) technologies to type traditional autosomal STR markers raises multiple questions regarding interpretation of the results via probabilistic genotyping. To begin to address some of those questions, we examined the effects of using differing degrees of sequence information, pre-filtering, and data modeling to interpret complex MPS-STR mixtures in a probabilistic genotyping software. Sixty ForenSeq typing results for mixtures of from two to four contributors were: 1) represented using three separate formats that captured different degrees of sequence information, and 2) were analyzed using three different filtering approaches prior to probabilistic interpretation. All mixtures for the different format and filtering variants were subsequently interpreted with respect to ten reference profiles, using both qualitative (LRmix) and quantitative (EuroForMix) models to calculate the likelihood ratio (LR). The LR results indicated moderate information gain when the STR nomenclature was based upon the longest uninterrupted stretch (LUS) compared with conventional capillary electrophoresis repeat units (RU), whereas additional gains were very small when the complete sequence information was utilised. Use of a static analytical threshold for data pre-filtering improved LRs compared to a dynamic (percentage-based) threshold, as the static threshold prevented excessive filtering of alleles originating from minor contributors. For interpretations performed using a quantitative model, a small improvement in performance was observed if a stutter model was employed instead of using stutter thresholds to pre-filter the data, whereas - as expected - performance worsened considerably under the qualitative model when stutter was not pre-filtered. Given the empirical and theoretical findings in this study we discuss the value of utilizing sequence-level information and potential paths forward to increase information gain using MPS systems.

[1]  Lilliana I. Moreno,et al.  Short tandem repeat genotypes of samples from eleven populations comprising the FBI’s population database , 2019, Forensic Science International: Reports.

[2]  Rebecca S Just,et al.  Use of the LUS in sequence allele designations to facilitate probabilistic genotyping of NGS-based STR typing results. , 2018, Forensic science international. Genetics.

[3]  Bruce S Weir,et al.  THE RARITY OF DNA PROFILES. , 2007, The annals of applied statistics.

[4]  Øyvind Bleka,et al.  CaseSolver: An investigative open source expert system based on EuroForMix. , 2019, Forensic science international. Genetics.

[5]  Øyvind Bleka,et al.  Automation of high volume MPS mixture interpretation using CaseSolver , 2019 .

[6]  Rebecca Just,et al.  A closer look at Verogen's Forenseq™ DNA Signature Prep kit autosomal and Y‐STR data for streamlined analysis of routine reference samples , 2018, Electrophoresis.

[7]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[8]  Hinda Haned,et al.  Forensim: an open-source initiative for the evaluation of statistical methods in forensic genetics. , 2011, Forensic science international. Genetics.

[9]  D. Balding,et al.  A comparison of software for the evaluation of complex DNA profiles. , 2019, Forensic science international. Genetics.

[10]  Øyvind Bleka,et al.  Open source software EuroForMix can be used to analyse complex SNP mixtures. , 2017, Forensic science international. Genetics.

[11]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[12]  Øyvind Bleka,et al.  EuroForMix: An open source software based on a continuous model to evaluate STR DNA profiles from a mixture of contributors with artefacts. , 2016, Forensic science international. Genetics.

[13]  Hinda Haned,et al.  Estimating the Number of Contributors to Forensic DNA Mixtures: Does Maximum Likelihood Perform Better Than Maximum Allele Count? , 2011, Journal of forensic sciences.

[14]  M. Perlin,et al.  Validating TrueAllele® DNA Mixture Interpretation * ,† , 2011, Journal of forensic sciences.

[15]  D A Jones,et al.  Blood samples: probability of discrimination. , 1972, Journal - Forensic Science Society.

[16]  Titia Sijen,et al.  FDSTools: A software package for analysis of massively parallel sequencing data with the ability to recognise and correct STR stutter and other PCR or sequencing noise. , 2017, Forensic science international. Genetics.

[17]  A. Tillmar,et al.  An overall limited effect on the weight-of-evidence when taking STR DNA sequence polymorphism into account in kinship analysis. , 2019, Forensic science international. Genetics.

[18]  Steffen L. Lauritzen,et al.  Computational aspects of DNA mixture analysis , 2013, Statistics and Computing.

[19]  Duncan Taylor,et al.  Developmental validation of STRmix™, expert software for the interpretation of forensic DNA profiles. , 2016, Forensic science international. Genetics.

[20]  Øyvind Bleka,et al.  A comparative study of qualitative and quantitative models used to interpret complex STR DNA profiles. , 2016, Forensic science international. Genetics.

[21]  R. Just,et al.  LUS+: Extension of the LUS designator concept to differentiate most sequence alleles for 27 STR loci , 2020 .

[22]  Thore Egeland,et al.  relMix: An open source software for DNA mixtures with related contributors , 2019 .

[23]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[24]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[25]  T. Egeland,et al.  Characterization of degradation and heterozygote balance by simulation of the forensic DNA analysis process , 2016, International Journal of Legal Medicine.

[26]  Ming-Yih Wu,et al.  Massively parallel sequencing analysis of nondegraded and degraded DNA mixtures using the ForenSeq™ system in combination with EuroForMix software , 2018, International Journal of Legal Medicine.

[27]  W. Hwu,et al.  Analysis of nondegraded and degraded DNA mixtures of close relatives using massively parallel sequencing. , 2019, Legal medicine.

[28]  Michael A. Marciano,et al.  PACE: Probabilistic Assessment for Contributor Estimation- A machine learning-based assessment of the number of contributors in DNA mixtures. , 2017, Forensic science international. Genetics.

[29]  Søren B. Vilsen,et al.  Modelling allelic drop-outs in STR sequencing data generated by MPS. , 2018, Forensic science international. Genetics.

[30]  Jo-Anne Bright,et al.  Probabilistic genotyping software: An overview. , 2019, Forensic science international. Genetics.

[31]  David J Balding,et al.  Evaluation of low-template DNA profiles using peak heights , 2016, Statistical applications in genetics and molecular biology.