Finding more effective microsatellite markers for forensics

Published by the Combined DNA Index System (CODIS) program of the Federal Bureau of Investigation (FBI) in 1997, the 13 core short tandem repeat (STR) loci are widely adopted as genetic markers in forensic applications, e.g., identity testing and paternity testing. However, these loci may be biased and suffer from reduced sensitivities towards specific population groups. In addition, the rapid growth of entries in forensic databases raises the chance of random hits, which can cause false judgments of innocents as criminals. A solution to these problems is to introduce more effective STR markers. The availability of whole genome sequencing enables us to identify more reliable STR markers for forensic applications computationally. In this study, we proposed an algorithm to identify STR markers with high discriminative abilities from the next-generation sequencing (NGS) data. Our algorithm could select a customized set of loci for a given population with pre-specified discriminative thresholds. We have applied the method to 320 Chinese individuals from the 1,000 Genomes Project and obtained various numbers of loci, which were able to statistically identify an individual worldwide and had higher combined powers of discrimination (CPD) and combined probabilities of exclusion (CPE) than the existing CODIS 13 loci. For identity testing, the mean frequency of DNA profile (FDP) with the selected 11 STRs was smaller than that with CODIS 13 STRs by student's t-test. With more loci, much smaller FDPs were obtained. The database matching probabilities (DMP) for selected loci were also lower than that for CODIS 13 STRs in a database with 10 billion entries. Moreover, the selected loci were able to provide considerably low chance of random profile matches so that statistically no false judgments could occur. The selected loci also reduced the risk of random allele matches when doing the familial search, with lower random allele matching probabilities. In addition, the selected STRs were statistically better than CODIS STRs for paternity testing in our simulated data, with lower probabilities of false inclusions and exclusions.

[1]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[2]  Jun Wang,et al.  [1483 cases of paternity test with STR loci mutation]. , 2014, Fa yi xue za zhi.

[3]  C. Key Application of Next-generation Sequencing Technology in Forensic Science , 2014 .

[4]  Changhui Liu,et al.  Allele frequencies of 15 STRs in five ethnic groups (Han, Gelao, Jing, Shui and Zhuang) in South China. , 2013, Forensic science international. Genetics.

[5]  S. Rosset,et al.  lobSTR: A short tandem repeat profiler for personal genomes , 2012, RECOMB.

[6]  Ann-Christine Syvänen,et al.  Next-generation sequencing technologies and applications for human genetic history and forensics , 2011, Investigative Genetics.

[7]  Lei Zhang,et al.  Short-tandem repeat analysis in seven Chinese regional populations , 2010, Genetics and molecular biology.

[8]  J. Butler,et al.  A 26plex Autosomal STR Assay to Aid Human Identity Testing * † , 2009, Journal of forensic sciences.

[9]  Jocelyn Kaiser,et al.  A Plan to Capture Human Diversity in 1000 Genomes , 2008, Science.

[10]  F. Rousset genepop’007: a complete re‐implementation of the genepop software for Windows and Linux , 2008, Molecular ecology resources.

[11]  J. Butler,et al.  Genetics and Genomics of Core Short Tandem Repeat Loci Used in Human Identity Testing , 2006, Journal of forensic sciences.

[12]  A. Bittles,et al.  STR polymorphisms of “forensic loci” in the northern Han Chinese population , 2003, Journal of Human Genetics.

[13]  Jeffrey Ross-Ibarra,et al.  Genetic Data Analysis II. Methods for Discrete Population Genentic Data , 2002 .

[14]  B Budowle,et al.  CODIS STR loci data from 41 sample populations. , 2001, Journal of forensic sciences.

[15]  W. Chantratita,et al.  Paternity testing by PCR-based STR analysis. , 2000, Journal of the Medical Association of Thailand = Chotmaihet thangphaet.

[16]  B Budowle,et al.  Population data on the thirteen CODIS core short tandem repeat loci in African Americans, U.S. Caucasians, Hispanics, Bahamians, Jamaicans, and Trinidadians. , 1999, Journal of forensic sciences.

[17]  L. Jin,et al.  Genetic variation at five trimeric and tetrameric tandem repeat loci in four human population groups. , 1992, Genomics.

[18]  C. Caskey,et al.  DNA typing and genetic mapping with trimeric and tetrameric tandem repeats. , 1991, American journal of human genetics.