Comprehensive Outline of Whole Exome Sequencing Data Analysis Tools Available in Clinical Oncology

Whole exome sequencing (WES) enables the analysis of all protein coding sequences in the human genome. This technology enables the investigation of cancer-related genetic aberrations that are predominantly located in the exonic regions. WES delivers high-throughput results at a reasonable price. Here, we review analysis tools enabling utilization of WES data in clinical and research settings. Technically, WES initially allows the detection of single nucleotide variants (SNVs) and copy number variations (CNVs), and data obtained through these methods can be combined and further utilized. Variant calling algorithms for SNVs range from standalone tools to machine learning-based combined pipelines. Tools for CNV detection compare the number of reads aligned to a dedicated segment. Both SNVs and CNVs help to identify mutations resulting in pharmacologically druggable alterations. The identification of homologous recombination deficiency enables the use of PARP inhibitors. Determining microsatellite instability and tumor mutation burden helps to select patients eligible for immunotherapy. To pave the way for clinical applications, we have to recognize some limitations of WES, including its restricted ability to detect CNVs, low coverage compared to targeted sequencing, and the missing consensus regarding references and minimal application requirements. Recently, Galaxy became the leading platform in non-command line-based WES data processing. The maturation of next-generation sequencing is reinforced by Food and Drug Administration (FDA)-approved methods for cancer screening, detection, and follow-up. WES is on the verge of becoming an affordable and sufficiently evolved technology for everyday clinical use.

[1]  J. Sosman,et al.  Genomic and Transcriptomic Features of Response to Anti-PD-1 Therapy in Metastatic Melanoma , 2016, Cell.

[2]  F. López-Ríos,et al.  Implementing TMB measurement in clinical practice: considerations on assay requirements , 2019, ESMO Open.

[3]  S. Hochreiter,et al.  cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate , 2012, Nucleic acids research.

[4]  Wendy S. W. Wong,et al.  Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs , 2012, Bioinform..

[5]  David T. W. Jones,et al.  Signatures of mutational processes in human cancer , 2013, Nature.

[6]  Xin Jin,et al.  An exome sequencing pipeline for identifying and genotyping common CNVs associated with disease with application to psoriasis , 2012, Bioinform..

[7]  Romina Royo,et al.  A practical guide for mutational signature analysis in hematological malignancies , 2019, Nature Communications.

[8]  A. Czirók,et al.  Cell Dispersal Influences Tumor Heterogeneity and Introduces a Bias in NGS Data Interpretation , 2017, Scientific Reports.

[9]  Rongjun Guo,et al.  Effects of Improved DNA Integrity by Punch From Tissue Blocks as Compared to Pinpoint Extraction From Unstained Slides on Next-Generation Sequencing Quality Metrics. , 2019, American journal of clinical pathology.

[10]  A. Sethi,et al.  The Cancer Genomics Cloud: Collaborative, Reproducible, and Democratized-A New Paradigm in Large-Scale Computational Research. , 2017, Cancer research.

[11]  Christos Hatzis,et al.  A genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6,697 breast cancer patients , 2015, Genome Medicine.

[12]  Peter J. Park,et al.  Evaluation of somatic copy number estimation tools for whole-exome sequencing data , 2016, Briefings Bioinform..

[13]  Bo Zhou,et al.  Whole-genome sequencing analysis of CNV using low-coverage and paired-end strategies is efficient and outperforms array-based CNV analysis , 2018, Journal of Medical Genetics.

[14]  Niklaus J Grünwald,et al.  vcfr: a package to manipulate and visualize variant call format data in R , 2017, Molecular ecology resources.

[15]  D. Carbone,et al.  First-Line Nivolumab in Stage IV or Recurrent Non-Small Cell Lung Cancer , 2017 .

[16]  Alessandro Pietrelli,et al.  myVCF: a desktop application for high‐throughput mutations data management , 2017, Bioinform..

[17]  Fatima Zare,et al.  An evaluation of copy number variation detection tools for cancer using whole exome sequencing data , 2017, BMC Bioinformatics.

[18]  Nancy R. Zhang,et al.  CODEX: a normalization and copy number variation detection method for whole exome sequencing , 2015, Nucleic acids research.

[19]  S. Batzoglou,et al.  Linking disease associations with regulatory information in the human genome , 2012, Genome research.

[20]  Jacek Majewski,et al.  FishingCNV: a graphical software package for detecting rare copy number variations in exome-sequencing data , 2013, Bioinform..

[21]  B. Győrffy,et al.  Uncovering Potential Therapeutic Targets in Colorectal Cancer by Deciphering Mutational Status and Expression of Druggable Oncogenes , 2019, Cancers.

[22]  J. Reis-Filho,et al.  Pan-cancer analysis of bi-allelic alterations in homologous recombination DNA repair genes , 2017, Nature Communications.

[23]  Ahmet Zehir,et al.  Molecular Determinants of Response to Anti-Programmed Cell Death (PD)-1 and Anti-Programmed Death-Ligand 1 (PD-L1) Blockade in Patients With Non-Small-Cell Lung Cancer Profiled With Targeted Next-Generation Sequencing. , 2018, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[24]  Lin He,et al.  In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data , 2016, Scientific Reports.

[25]  S. Rüdiger,et al.  Messing up disorder: how do missense mutations in the tumor suppressor protein APC lead to cancer? , 2011, Molecular Cancer.

[26]  Yeon Jeong Kim,et al.  Analyzing Somatic Genome Rearrangements in Human Cancers by Using Whole-Exome Sequencing. , 2016, American journal of human genetics.

[27]  H. Bian,et al.  A three-caller pipeline for variant analysis of cancer whole-exome sequencing data , 2017, Molecular medicine reports.

[28]  G. Wilson,et al.  Computational Methods for Analysis of Tumor Clonality and Evolutionary History. , 2018, Methods in molecular biology.

[29]  B. Győrffy,et al.  Mutations Defining Patient Cohorts With Elevated PD-L1 Expression in Gastric Cancer , 2019, Front. Pharmacol..

[30]  Feng Xu,et al.  FaSD-somatic: a fast and accurate somatic SNV detection algorithm for cancer genome sequencing data , 2014, Bioinform..

[31]  Nancy R. Zhang,et al.  Allele-specific copy number profiling by next-generation DNA sequencing , 2014, Nucleic acids research.

[32]  J. Lunceford,et al.  Pan-tumor genomic biomarkers for PD-1 checkpoint blockade–based immunotherapy , 2018, Science.

[33]  Sudhir Kumar,et al.  Power and pitfalls of computational methods for inferring clone phylogenies and mutation orders from bulk sequencing data , 2019, bioRxiv.

[34]  L. Pusztai,et al.  An integrative bioinformatics approach reveals coding and non-coding gene variants associated with gene expression profiles and outcome in breast cancer molecular subtypes , 2018, British Journal of Cancer.

[35]  A. Wilm,et al.  LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets , 2012, Nucleic acids research.

[36]  J. Brooks,et al.  Mutations of the VHL tumour suppressor gene in renal carcinoma , 1994, Nature Genetics.

[37]  Bharanidharan Devarajan,et al.  Performance Assessment of Variant Calling Pipelines using Human Whole Exome Sequencing and Simulated data , 2019, BMC Bioinform..

[38]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[39]  E. Birney,et al.  HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures , 2017, Nature Medicine.

[40]  S. Sukumar,et al.  Guidelines for the selection of functional assays to evaluate the hallmarks of cancer. , 2016, Biochimica et biophysica acta.

[41]  Jianmin Wu,et al.  Somatic Point Mutation Calling in Low Cellularity Tumors , 2013, PloS one.

[42]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[43]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.

[44]  Zhifu Sun,et al.  Use of FFPE-Derived DNA in Next Generation Sequencing: DNA extraction methods , 2019 .

[45]  B. Teh,et al.  MSIseq: Software for Assessing Microsatellite Instability from Catalogs of Somatic Mutations , 2015, Scientific Reports.

[46]  Chun Liang,et al.  MSIpred: a python package for tumor microsatellite instability classification from tumor mutation annotation data using a support vector machine , 2018, Scientific Reports.

[47]  Huan Zhang,et al.  Anaconda: AN automated pipeline for somatic COpy Number variation Detection and Annotation from tumor exome sequencing data , 2017, BMC Bioinformatics.

[48]  Sohrab P. Shah,et al.  JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data , 2012, Bioinform..

[49]  Marcel J T Reinders,et al.  WISExome: a within-sample comparison approach to detect copy number variations in whole exome sequencing data , 2017, European Journal of Human Genetics.

[50]  Lan Mei,et al.  Shimmer: detection of genetic alterations in tumors using next-generation sequence data , 2013, Bioinform..

[51]  T. Clancy,et al.  NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer , 2019, BMC Medical Genomics.

[52]  B. Schumacher,et al.  DNA repair mechanisms in cancer development and therapy , 2015, Front. Genet..

[53]  Martin L. Miller,et al.  Mutational landscape determines sensitivity to PD-1 blockade in non–small cell lung cancer , 2015, Science.

[54]  Bradley P. Coe,et al.  Copy number variation detection and genotyping from exome sequence data , 2012, Genome research.

[55]  Xiaolin Zhu,et al.  An Evaluation of Copy Number Variation Detection Tools from Whole‐Exome Sequencing Data , 2014, Human mutation.

[56]  T. Pan,et al.  Regulatory Variants and Disease: The E-Cadherin −160C/A SNP as an Example , 2014, Molecular biology international.

[57]  S. Halgamuge,et al.  Inferring copy number and genotype in tumour exome data , 2014, BMC Genomics.

[58]  O. Hofmann,et al.  VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research , 2016, Nucleic acids research.

[59]  Jacek Majewski,et al.  LoLoPicker: detecting low allelic-fraction variants from low-quality cancer samples , 2016, bioRxiv.

[60]  Y. Park,et al.  Intratumor heterogeneity inferred from targeted deep sequencing as a prognostic indicator , 2019, Scientific Reports.

[61]  B. Giusti,et al.  EXCAVATOR: detecting copy number variants from whole-exome sequencing data , 2013, Genome Biology.

[62]  J. Deleuze,et al.  Molecular and Computational Methods for the Detection of Microsatellite Instability in Cancer , 2018, Front. Oncol..

[63]  Yusuke Sato,et al.  HapMuC: somatic mutation calling using heterozygous germ line variants near candidate mutations , 2014, Bioinform..

[64]  David Jones,et al.  cgpCaVEManWrapper: Simple Execution of CaVEMan in Order to Detect Somatic Single Nucleotide Variants in NGS Data , 2016, Current protocols in bioinformatics.

[65]  Tatiana Popova,et al.  Supplementary Methods , 2012, Acta Neuropsychiatrica.

[66]  B. Langmead,et al.  Cloud computing for genomic data analysis and collaboration , 2018, Nature Reviews Genetics.

[67]  G. Kong,et al.  Gene-based comparative analysis of tools for estimating copy number alterations using whole-exome sequencing data , 2017, Oncotarget.

[68]  Wang Wenyi,et al.  MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data , 2016, Genome Biology.

[69]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[70]  J. Wolchok,et al.  Genetic basis for clinical response to CTLA-4 blockade in melanoma. , 2014, The New England journal of medicine.

[71]  Xutao Deng,et al.  SeqGene: a comprehensive software solution for mining exome- and transcriptome- sequencing data , 2011, BMC Bioinformatics.

[72]  Frederick E. Dewey,et al.  CLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data , 2015, Bioinform..

[73]  Jay Shendure,et al.  Classification and characterization of microsatellite instability across 18 cancer types , 2016, Nature Medicine.

[74]  Chao Xie,et al.  CNV-seq, a new method to detect copy number variation using high-throughput sequencing , 2009, BMC Bioinformatics.

[75]  Peter J. Campbell,et al.  Subclonal variant calling with multiple samples and prior knowledge , 2014, Bioinform..

[76]  John Quackenbush,et al.  Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV , 2011, Bioinform..

[77]  Russell Bonneville,et al.  Performance evaluation for rapid detection of pan-cancer microsatellite instability with MANTIS , 2016, Oncotarget.

[78]  Alan Ashworth,et al.  Targeting the DNA repair defect in BRCA mutant cells as a therapeutic strategy , 2005, Nature.

[79]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[80]  E. Cuppen,et al.  Portrait of a cancer: mutational signature analyses for cancer diagnostics , 2019, BMC Cancer.

[81]  Brandi L. Cantarel,et al.  BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity , 2014, BMC Bioinformatics.

[82]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[83]  B. Taylor,et al.  deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution , 2016, Genome Biology.

[84]  Daniel Sinnett,et al.  SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing , 2016, BMC Genomics.

[85]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[86]  B. Vogelstein,et al.  PD-1 blockade in tumors with mismatch repair deficiency. , 2015, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[87]  Sampsa Hautaniemi,et al.  Comparative analysis of methods for identifying somatic copy number alterations from deep sequencing data , 2015, Briefings Bioinform..

[88]  G. Walker,et al.  Mechanisms of DNA damage, repair, and mutagenesis , 2017, Environmental and molecular mutagenesis.

[89]  E. D’Agaro Artificial intelligence used in genome analysis studies , 2018 .

[90]  Weitai Huang,et al.  SMuRF: portable and accurate ensemble prediction of somatic mutations , 2019, Bioinform..

[91]  Liya Wang,et al.  SciApps: a cloud-based platform for reproducible bioinformatics workflows , 2018, Bioinform..

[92]  R. Handsaker,et al.  Large multi-allelic copy number variations in humans , 2015, Nature Genetics.

[93]  Sudhir Kumar,et al.  Predicting clone genotypes from tumor bulk sequencing of multiple samples , 2018, bioRxiv.

[94]  P. A. Futreal,et al.  MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data , 2016, Genome Biology.

[95]  Nicholas W. Wood,et al.  A robust model for read count data in exome sequencing experiments and implications for copy number variant calling , 2012, Bioinform..

[96]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[97]  H. Kume,et al.  An empirical Bayesian framework for somatic mutation detection from cancer genome sequencing data , 2013, Nucleic acids research.

[98]  A. Jeyasekharan,et al.  Biomarkers for Homologous Recombination Deficiency in Cancer , 2018, Journal of the National Cancer Institute.

[99]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[100]  Isabelle Salmon,et al.  Methods of measurement for tumor mutational burden in tumor tissue. , 2018, Translational lung cancer research.

[101]  Luciano Milanesi,et al.  isma: an R package for the integrative analysis of mutations detected by multiple pipelines , 2019, BMC Bioinformatics.

[102]  Robert Gentleman,et al.  VariantTools: an extensible framework for developing and testing variant callers , 2017, Bioinform..

[103]  D. Hume,et al.  Exome Sequencing: Current and Future Perspectives , 2015, G3: Genes, Genomes, Genetics.

[104]  T. Chan,et al.  Tumor and Microenvironment Evolution during Immunotherapy with Nivolumab , 2017, Cell.

[105]  Edwin Cuppen,et al.  MutationalPatterns: comprehensive genome-wide analysis of mutational processes , 2016, Genome Medicine.

[106]  C. Shaw,et al.  Homozygous and hemizygous CNV detection from exome sequencing data in a Mendelian disease cohort , 2016, Nucleic acids research.

[107]  C. Shee,et al.  Two Mechanisms Produce Mutation Hotspots at DNA Breaks in Escherichia coli , 2012, Cell reports.

[108]  M. Vignali,et al.  Contribution of systemic and somatic factors to clinical response and resistance to PD-L1 blockade in urothelial cancer: An exploratory multi-omic analysis , 2017, PLoS medicine.

[109]  V. Bafna,et al.  Virmid: accurate detection of somatic mutations with sample impurity inference , 2013, Genome Biology.

[110]  B. Győrffy,et al.  KRAS driven expression signature has prognostic power superior to mutation status in non‐small cell lung cancer , 2016, International journal of cancer.

[111]  Yuan Ji,et al.  Ultra-fast local-haplotype variant calling using paired-end DNA-sequencing data reveals somatic mosaicism in tumor and normal blood samples , 2015, Nucleic acids research.

[112]  Hugo Y. K. Lam,et al.  An ensemble approach to accurately detect somatic mutations using SomaticSeq , 2015, Genome Biology.

[113]  L. Pusztai,et al.  Deciphering and Targeting Oncogenic Mutations and Pathways in Breast Cancer. , 2016, The oncologist.

[114]  Olga Golosova,et al.  Unipro UGENE: a unified bioinformatics toolkit , 2012, Bioinform..

[115]  K. Hargadon,et al.  Immune checkpoint blockade therapy for cancer: An overview of FDA-approved immune checkpoint inhibitors. , 2018, International immunopharmacology.

[116]  Alistair G. Rust,et al.  Cake: a bioinformatics pipeline for the integrated analysis of somatic variants in cancer genomes , 2013, Bioinform..

[117]  Gholamreza Haffari,et al.  Feature-based classifiers for somatic mutation detection in tumour–normal paired sequencing data , 2011, Bioinform..

[118]  A. Scarpa,et al.  ESMO recommendations on microsatellite instability testing for immunotherapy in cancer, and its relationship with PD-1/PD-L1 expression and tumour mutational burden: a systematic review-based approach. , 2019, Annals of oncology : official journal of the European Society for Medical Oncology.

[119]  Ville Mustonen,et al.  The repertoire of mutational signatures in human cancer , 2018, Nature.

[120]  Eivind Hovig,et al.  Performance comparison of four exome capture systems for deep sequencing , 2014, BMC Genomics.

[121]  F. Nicolantonio,et al.  Inactivation of DNA repair triggers neoantigen generation and impairs tumour growth , 2017, Nature.

[122]  G. McVean,et al.  Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications , 2014, Nature Genetics.

[123]  Chang Xu,et al.  A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data , 2018, Computational and structural biotechnology journal.

[124]  A. Shlien,et al.  Copy number variations and cancer , 2009, Genome Medicine.

[125]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[126]  S. Gabriel,et al.  Genomic correlates of response to CTLA-4 blockade in metastatic melanoma , 2015, Science.

[127]  Li Ding,et al.  MIRMMR: binary classification of microsatellite instability using methylation and mutations , 2017, Bioinform..

[128]  Jason Li,et al.  CONTRA: copy number analysis for targeted resequencing , 2012, Bioinform..

[129]  Nancy R. Zhang,et al.  CODEX2: full-spectrum copy number variation detection by high-throughput DNA sequencing , 2017, Genome Biology.

[130]  Joshua M. Stuart,et al.  RADIA: RNA and DNA Integrated Analysis for Somatic Mutation Detection , 2014, PloS one.

[131]  Insuk Lee,et al.  Systematic comparison of variant calling pipelines using gold standard personal exome variants , 2015, Scientific Reports.

[132]  Mads Thomassen,et al.  Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data , 2016, PloS one.

[133]  Arun Ahuja,et al.  Genomic Features of Response to Combination Immunotherapy in Patients with Advanced Non-Small-Cell Lung Cancer , 2018, Cancer cell.

[134]  Song Liu,et al.  Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges , 2013, Oncotarget.

[135]  Zhongyang Zhang,et al.  SAAS-CNV: A Joint Segmentation Approach on Aggregated and Allele Specific Signals for the Identification of Somatic Copy Number Alterations with Next-Generation Sequencing Data , 2015, PLoS Comput. Biol..

[136]  Z. Szallasi,et al.  Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data , 2014, Annals of oncology : official journal of the European Society for Medical Oncology.

[137]  Jun Xia,et al.  Bacteria-to-Human Protein Networks Reveal Origins of Endogenous DNA Damage , 2018, Cell.

[138]  Nicolai J. Birkbak,et al.  Migrating the SNP array-based homologous recombination deficiency measures to next generation sequencing data of breast cancer , 2018, npj Breast Cancer.

[139]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[140]  Bernd Rinn,et al.  NGS-pipe: a flexible, easily extendable and highly configurable framework for NGS analysis , 2017, Bioinform..

[141]  E. Banks,et al.  Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. , 2012, American journal of human genetics.

[142]  Ken Chen,et al.  SomaticSniper: identification of somatic point mutations in whole genome sequencing data , 2012, Bioinform..

[143]  Eric Talevich,et al.  CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing , 2016, PLoS Comput. Biol..

[144]  Yongchao Liu,et al.  SNVSniffer: an integrated caller for germline and somatic single-nucleotide and indel mutations , 2016, BMC Systems Biology.