Comparison of three variant callers for human whole genome sequencing

Testing of patients with genetics-related disorders is in progress of shifting from single gene assays to gene panel sequencing, whole-exome sequencing (WES) and whole-genome sequencing (WGS). Since WGS is unquestionably becoming a new foundation for molecular analyses, we decided to compare three currently used tools for variant calling of human whole genome sequencing data. We tested DeepVariant, a new TensorFlow machine learning-based variant caller, and compared this tool to GATK 4.0 and SpeedSeq, using 30×, 15× and 10× WGS data of the well-known NA12878 DNA reference sample. According to our comparison, the performance on SNV calling was almost similar in 30× data, with all three variant callers reaching F-Scores (i.e. harmonic mean of recall and precision) equal to 0.98. In contrast, DeepVariant was more precise in indel calling than GATK and SpeedSeq, as demonstrated by F-Scores of 0.94, 0.90 and 0.84, respectively. We conclude that the DeepVariant tool has great potential and usefulness for analysis of WGS data in medical genetics.

[1]  B. Peterlin,et al.  Diagnostic outcomes of exome sequencing in patients with syndromic or non-syndromic hearing loss , 2018, PloS one.

[2]  L. Wieler,et al.  Chromosomally encoded ESBL genes in Escherichia coli of ST38 from Mongolian wild birds , 2017, The Journal of antimicrobial chemotherapy.

[3]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[4]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[5]  Ana Conesa,et al.  Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data , 2015, Bioinform..

[6]  D. Fiete,et al.  The glycan-specific sulfotransferase (R77W)GalNAc-4-ST1 putatively responsible for peeling skin syndrome has normal properties consistent with a simple sequence polymorphisim , 2017, Glycobiology.

[7]  John Blangero,et al.  Whole genome sequencing of 91 multiplex schizophrenia families reveals increased burden of rare, exonic copy number variation in schizophrenia probands and genetic heterogeneity , 2018, Schizophrenia Research.

[8]  J. Shin,et al.  Outbreak of KPC-2-producing Enterobacteriaceae caused by clonal dissemination of Klebsiella pneumoniae ST307 carrying an IncX3-type plasmid harboring a truncated Tn4401a. , 2017, Diagnostic microbiology and infectious disease.

[9]  Paul Flicek,et al.  Alignment of 1000 Genomes Project reads to reference assembly GRCh38 , 2017, GigaScience.

[10]  Robert J. Clifford,et al.  Analysis of Serial Isolates of mcr-1-Positive Escherichia coli Reveals a Highly Active ISApl1 Transposon , 2017, Antimicrobial Agents and Chemotherapy.

[11]  T. Peto,et al.  Contribution to Clostridium Difficile Transmission of Symptomatic Patients With Toxigenic Strains Who Are Fecal Toxin Negative , 2017, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[12]  Doina Caragea,et al.  Comparative genomics reveals differences in mobile virulence genes of Escherichia coli O103 pathotypes of bovine fecal origin , 2018, PloS one.

[13]  M. Kuroda,et al.  Streptococcal toxic shock syndrome caused by the dissemination of an invasive emm3/ST15 strain of Streptococcus pyogenes , 2017, BMC Infectious Diseases.

[14]  L. Vissers,et al.  Genome sequencing identifies major causes of severe intellectual disability , 2014, Nature.

[15]  Yi Xu,et al.  Identification of two novel pathogenic compound heterozygous MYO7A mutations in Usher syndrome by whole exome sequencing. , 2018, International journal of pediatric otorhinolaryngology.

[16]  J. Parkhill,et al.  Seeding and Establishment of Legionella pneumophila in Hospitals: Implications for Genomic Investigations of Nosocomial Legionnaires’ Disease , 2017, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[17]  J. Mullikin,et al.  Molecular genetic findings and clinical correlations in 100 patients with Joubert syndrome and related disorders prospectively evaluated at a single center , 2017, Genetics in Medicine.

[18]  Taesung Park,et al.  Sequence data and association statistics from 12,940 type 2 diabetes cases and controls , 2017, Scientific Data.

[19]  J. Lupski,et al.  From genomic medicine to precision medicine: highlights of 2015 , 2016, Genome Medicine.

[20]  Anne-Kathrin Kienzler,et al.  The role of genomics in common variable immunodeficiency disorders , 2017, Clinical and experimental immunology.

[21]  Xin Zhou,et al.  Pan-cancer genome and transcriptome analyses of 1,699 pediatric leukemias and solid tumors , 2018, Nature.

[22]  Tomasz Stokowy,et al.  Duplicated Enhancer Region Increases Expression of CTSB and Segregates with Keratolytic Winter Erythema in South African and Norwegian Families. , 2017, American journal of human genetics.

[23]  S. Ferdinandusse,et al.  Lethal neonatal case and review of primary short-chain enoyl-CoA hydratase (SCEH) deficiency associated with secondary lymphocyte pyruvate dehydrogenase complex (PDC) deficiency. , 2017, Molecular genetics and metabolism.

[24]  Heng Li,et al.  Toward better understanding of artifacts in variant calling from high-coverage samples , 2014, Bioinform..

[25]  M. Kauffman,et al.  Whole exome sequencing in neurogenetic odysseys: An effective, cost- and time-saving diagnostic approach , 2018, PloS one.

[26]  C. Elger,et al.  Homozygous mutation in TXNRD1 is associated with genetic generalized epilepsy , 2017, Free radical biology & medicine.

[27]  Alexander Hoischen,et al.  New insights into the generation and role of de novo mutations in health and disease , 2016, Genome Biology.

[28]  H. Nakaya,et al.  Discordant congenital Zika syndrome twins show differential in vitro viral susceptibility of neural progenitor cells , 2018, Nature Communications.

[29]  S. Deeks,et al.  Short-Read Whole-Genome Sequencing for Laboratory-Based Surveillance of Bordetella pertussis , 2017, Journal of Clinical Microbiology.

[30]  Y. Totoki,et al.  Whole exome sequencing to identify genetic markers for trastuzumab‐induced cardiotoxicity , 2018, Cancer science.

[31]  C. Bole-Feysot,et al.  Chondrodysplasia with multiple dislocations: comprehensive study of a series of 30 cases , 2017, Clinical genetics.

[32]  T. Strom,et al.  De Novo Variants in GRIA4 Lead to Intellectual Disability with or without Seizures and Gait Abnormalities. , 2017, American journal of human genetics.

[33]  B. Yan,et al.  GUCA1A mutation causes maculopathy in a five-generation family with a wide spectrum of severity , 2017, Genetics in Medicine.

[34]  Joshua S. Paul,et al.  Genotype and SNP calling from next-generation sequencing data , 2011, Nature Reviews Genetics.

[35]  Qingguo Wang,et al.  Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives , 2013, BMC Bioinformatics.

[36]  P. François,et al.  Clonal or not clonal? Investigating hospital outbreaks of KPC-producing Klebsiella pneumoniae with whole-genome sequencing. , 2017, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[37]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[38]  John C. Chambers,et al.  114 Whole genome sequencing to identify genetic variants underlying cardiovascular disease among Indian Asians , 2012, Heart.

[39]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[40]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[41]  Oliver Hofmann,et al.  Copy-number signatures and mutational processes in ovarian carcinoma , 2017 .

[42]  J Lindberg,et al.  Validation of risk stratification models in acute myeloid leukemia using sequencing-based molecular profiling , 2017, Leukemia.

[43]  Daniel E. Newburger,et al.  Creating a universal SNP and small indel variant caller with deep neural networks , 2016, bioRxiv.

[44]  J. Mullikin,et al.  Mutations in KIAA0753 cause Joubert syndrome associated with growth hormone deficiency , 2017, Human Genetics.

[45]  Zaw Win Aung,et al.  Elucidating the genomic architecture of Asian EGFR-mutant lung adenocarcinoma through multi-region exome sequencing , 2018, Nature Communications.

[46]  Wei Zhang,et al.  Two novel mutations in the PPIB gene cause a rare pedigree of osteogenesis imperfecta type IX. , 2017, Clinica chimica acta; international journal of clinical chemistry.

[47]  R. Siebert,et al.  Genomic profiling of Acute lymphoblastic leukemia in ataxia telangiectasia patients reveals tight link between ATM mutations and chromothripsis , 2017, Leukemia.

[48]  G. Brandi,et al.  Genome-Wide Analysis Identifies MEN1 and MAX Mutations and a Neuroendocrine-Like Molecular Heterogeneity in Quadruple WT GIST , 2017, Molecular Cancer Research.

[49]  B. Beverloo,et al.  Genetic characterization of Polish ccRCC patients: somatic mutation analysis of PBRM1, BAP1 and KDMC5, genomic SNP array analysis in tumor biopsy and preliminary results of chromosome aberrations analysis in plasma cell free DNA , 2017, Oncotarget.

[50]  Mika Ito,et al.  Whole genome analysis of porcine astroviruses detected in Japanese pigs reveals genetic diversity and possible intra-genotypic recombination. , 2017, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[51]  T. Edens,et al.  Integron-Associated DfrB4, a Previously Uncharacterized Member of the Trimethoprim-Resistant Dihydrofolate Reductase B Family, Is a Clinically Identified Emergent Source of Antibiotic Resistance , 2017, Antimicrobial Agents and Chemotherapy.

[52]  David E. Bard,et al.  Route of infection alters virulence of neonatal septicemia Escherichia coli clinical isolates , 2017, PloS one.

[53]  Z. Iqbal,et al.  DNA extraction from primary liquid blood cultures for bloodstream infection diagnosis using whole genome sequencing , 2018, Journal of medical microbiology.

[54]  J. Long,et al.  Steps to ensure accuracy in genotype and SNP calling from Illumina sequencing data , 2012, BMC Genomics.

[55]  E. J. Parmley,et al.  A Whole-Genome Sequencing Approach To Study Cefoxitin-Resistant Salmonella enterica Serovar Heidelberg Isolates from Various Sources , 2017, Antimicrobial Agents and Chemotherapy.

[56]  Wei Qiu,et al.  Whole-genome analyses of human adenovirus type 55 emerged in Tibet, Sichuan and Yunnan in China, in 2016 , 2017, PloS one.

[57]  Giovanni Martinelli,et al.  Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data , 2016, BMC Bioinformatics.

[58]  H. Nakagawa,et al.  Whole genome sequencing analysis for cancer genomics and precision medicine , 2018, Cancer science.

[59]  S. Scambler Highlights of 2015 , 2015 .

[60]  P. Nordmann,et al.  Recent advances in biochemical and molecular diagnostics for the rapid detection of antibiotic-resistant Enterobacteriaceae: a focus on ß-lactam resistance , 2017, Expert review of molecular diagnostics.

[61]  M. Wagner,et al.  Identification of co-occurrence in a patient with Dent's disease and ADA2-deficiency by exome sequencing. , 2018, Gene.

[62]  Naftali Kaminski,et al.  Extreme Trait Whole‐Genome Sequencing Identifies PTPRO as a Novel Candidate Gene in Emphysema with Severe Airflow Obstruction , 2017, American journal of respiratory and critical care medicine.

[63]  E. Diamandis,et al.  Whole genome sequencing as a diagnostic test: challenges and opportunities. , 2014, Clinical chemistry.

[64]  Minjun Yang,et al.  Genomic complexity and targeted genes in anaplastic thyroid cancer cell lines. , 2017, Endocrine-related cancer.

[65]  Ryan M. Layer,et al.  SpeedSeq: Ultra-fast personal genome analysis and interpretation , 2014, Nature Methods.

[66]  Y. Ishii,et al.  Nosocomial transmission of carbapenem-resistant Klebsiella pneumoniae elucidated by single-nucleotide variation analysis: a case investigation , 2017, Infection.

[67]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.