Secure and robust cloud computing for high-throughput forensic microsatellite sequence analysis and databasing.

Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity.

[1]  Bruce Budowle,et al.  An evaluation of the PowerSeq™ Auto System: A multiplex short tandem repeat marker kit compatible with massively parallel sequencing. , 2015, Forensic science international. Genetics.

[2]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[3]  Mitchell M Holland,et al.  Development and assessment of an optimized next-generation DNA sequencing approach for the mtgenome using the Illumina MiSeq. , 2014, Forensic science international. Genetics.

[4]  Jocelyne Bruand,et al.  Developmental validation of the MiSeq FGx Forensic Genomics System for Targeted Next Generation Sequencing in Forensic DNA Casework and Database Laboratories. , 2017, Forensic science international. Genetics.

[5]  Helmut Krcmar,et al.  Big Data , 2014, Wirtschaftsinf..

[6]  Bruce Budowle,et al.  STRait Razor v2.0: the improved STR Allele Identification Tool--Razor. , 2015, Forensic science international. Genetics.

[7]  Dieter Deforce,et al.  Forensic SNP Genotyping using Nanopore MinION Sequencing , 2017, Scientific Reports.

[8]  Bruce Budowle,et al.  STRait Razor: a length-based forensic STR allele-calling tool for use with second generation sequencing data. , 2013, Forensic science international. Genetics.

[9]  Aaron R. Quinlan,et al.  Poretools: a toolkit for analyzing nanopore sequence data , 2014, bioRxiv.

[10]  Yaniv Erlich,et al.  Profiling short tandem repeats from short reads. , 2013, Methods in molecular biology.

[11]  Bruce Budowle,et al.  Massively parallel sequencing of forensic STRs: Considerations of the DNA commission of the International Society for Forensic Genetics (ISFG) on minimal nomenclature requirements. , 2016, Forensic science international. Genetics.

[12]  Niels Morling,et al.  Evaluation of the Precision ID Ancestry Panel for crime case work: A SNP typing assay developed for typing of 165 ancestral informative markers. , 2017, Forensic science international. Genetics.

[13]  Mitchell M Holland,et al.  Molecular Analysis of the Human Mitochondrial DNA Control Region for Forensic Identity Testing , 2000, Current protocols in human genetics.

[14]  Niels Morling,et al.  Statistical modelling of Ion PGM HID STR 10-plex MPS data. , 2017, Forensic science international. Genetics.

[15]  Rob Ogden,et al.  Forensic science, genetics and wildlife biology: getting the right mix for a wildlife DNA forensics lab , 2010, Forensic science, medicine, and pathology.

[16]  Walther Parson,et al.  Evaluation of the Illumina ForenSeq™ DNA Signature Prep Kit - MPS forensic application for the MiSeq FGx™ benchtop sequencer. , 2017, Forensic science international. Genetics.

[17]  Ralf Bundschuh,et al.  Short-read, high-throughput sequencing technology for STR genotyping. , 2012, BioTechniques. Rapid dispatches.

[18]  Bruce Budowle,et al.  Evaluation of the Illumina(®) Beta Version ForenSeq™ DNA Signature Prep Kit for use in genetic profiling. , 2016, Forensic science international. Genetics.

[19]  Mitchell M Holland,et al.  Evaluation of GeneMarker® HTS for improved alignment of mtDNA MPS data, haplotype determination, and heteroplasmy assessment. , 2017, Forensic science international. Genetics.

[20]  C. Thermes,et al.  Ten years of next-generation sequencing technology. , 2014, Trends in genetics : TIG.

[21]  S. Rosset,et al.  lobSTR: A short tandem repeat profiler for personal genomes , 2012, RECOMB.

[22]  Titia Sijen,et al.  FDSTools: A software package for analysis of massively parallel sequencing data with the ability to recognise and correct STR stutter and other PCR or sequencing noise. , 2017, Forensic science international. Genetics.

[23]  Bruce Budowle,et al.  STRait Razor v2s: Advancing sequence-based STR allele reporting and beyond to other marker systems. , 2017, Forensic science international. Genetics.

[24]  Douglas R Storts,et al.  Massively parallel sequencing of short tandem repeats-Population data and mixture analysis results for the PowerSeq™ system. , 2016, Forensic science international. Genetics.

[25]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[26]  Lilliana I Moreno,et al.  Performance and concordance of the ForenSeq™ system for autosomal and Y chromosome short tandem repeat sequencing of reference-type specimens. , 2017, Forensic science international. Genetics.

[27]  Dieter Deforce,et al.  My-Forensic-Loci-queries (MyFLq) framework for analysis of forensic STR data generated by massive parallel sequencing. , 2014, Forensic science international. Genetics.