SpeedSeq: Ultra-fast personal genome analysis and interpretation

Comprehensive interpretation of human genome sequencing data is a challenging bioinformatic problem that typically requires weeks of analysis, with extensive hands-on expert involvement. This informatics bottleneck inflates genome sequencing costs, poses a computational burden for large-scale projects, and impedes the adoption of time-critical clinical applications such as personalized cancer profiling and newborn disease diagnosis, where the actionable timeframe can measure in hours or days. We developed SpeedSeq, an open-source genome analysis platform that vastly reduces computing time. SpeedSeq accomplishes read alignment, duplicate removal, variant detection and functional annotation of a 50X human genome in <24 hours, even using one low-cost server. SpeedSeq offers competitive or superior performance to current methods for detecting germline and somatic single nucleotide variants (SNVs), indels, and structural variants (SVs) and includes novel functionality for SV genotyping, SV annotation, fusion gene detection, and rapid identification of actionable mutations. SpeedSeq will help bring timely genome analysis into the clinical realm. Availability: SpeedSeq is available at https://github.com/cc2qe/speedseq.

[1]  Daniel Rios,et al.  Bioinformatics Applications Note Databases and Ontologies Deriving the Consequences of Genomic Variants with the Ensembl Api and Snp Effect Predictor , 2022 .

[2]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[3]  Benjamin J. Raphael,et al.  An integrative probabilistic model for identification of structural variation in sequencing data , 2012, Genome Biology.

[4]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[5]  Kelly Schoch,et al.  Clinical application of exome sequencing in undiagnosed genetic conditions , 2012, Journal of Medical Genetics.

[6]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[7]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[8]  Ken Chen,et al.  Use of whole-genome sequencing to diagnose a cryptic fusion oncogene. , 2011, JAMA.

[9]  D. Dimmock,et al.  Next-generation sequencing facilitates the diagnosis in a child with twinkle mutations causing cholestatic liver failure. , 2012, Journal of pediatric gastroenterology and nutrition.

[10]  Michael E Talkowski,et al.  Clinical diagnosis by whole-genome sequencing of a prenatal sample. , 2012, The New England journal of medicine.

[11]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[12]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[13]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[14]  David R. Murdock,et al.  Whole-Genome Sequencing for Optimized Patient Management , 2011, Science Translational Medicine.

[15]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[16]  Joshua M. Korn,et al.  Discovery and genotyping of genome structural polymorphism by sequencing on a population scale , 2011, Nature Genetics.

[17]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[18]  Nicolas Stransky,et al.  The landscape of kinase fusions in cancer , 2014, Nature Communications.

[19]  Peter Saffrey,et al.  Rapid Whole-Genome Sequencing for Genetic Disease Diagnosis in Neonatal Intensive Care Units , 2012, Science Translational Medicine.

[20]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[21]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[22]  Euan A Ashley,et al.  Clinical interpretation and implications of whole-genome sequencing. , 2014, JAMA.

[23]  Ken Chen,et al.  SomaticSniper: identification of somatic point mutations in whole genome sequencing data , 2012, Bioinform..

[24]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[25]  David P Bick,et al.  Making a definitive diagnosis: Successful clinical application of whole exome sequencing in a child with intractable inflammatory bowel disease , 2011, Genetics in Medicine.

[26]  Joshua F. McMichael,et al.  DGIdb - Mining the druggable genome , 2013, Nature Methods.

[27]  Pengfei Liu,et al.  Mechanisms for recurrent and complex human genomic rearrangements. , 2012, Current opinion in genetics & development.

[28]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[29]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[30]  Michael P. Snyder,et al.  Mutations in NGLY1 cause an inherited disorder of the endoplasmic reticulum–associated degradation pathway , 2014, Genetics in Medicine.

[31]  Aaron R. Quinlan,et al.  GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations , 2013, PLoS Comput. Biol..

[32]  Ira M. Hall,et al.  SAMBLASTER: fast duplicate marking and structural variant read extraction , 2014, Bioinform..