isma: an R package for the integrative analysis of mutations detected by multiple pipelines

BackgroundRecent comparative studies have brought to our attention how somatic mutation detection from next-generation sequencing data is still an open issue in bioinformatics, because different pipelines result in a low consensus. In this context, it is suggested to integrate results from multiple calling tools, but this operation is not trivial and the burden of merging, comparing, filtering and explaining the results demands appropriate software.ResultsWe developed isma (integrative somatic mutation analysis), an R package for the integrative analysis of somatic mutations detected by multiple pipelines for matched tumor-normal samples. The package provides a series of functions to quantify the consensus, estimate the variability, underline outliers, integrate evidences from publicly available mutation catalogues and filter sites. We illustrate the capabilities of isma analysing breast cancer somatic mutations generated by The Cancer Genome Atlas (TCGA) using four pipelines.ConclusionsComparing different “points of view” on the same data, isma generates a unique mutation catalogue and a series of reports that underline common patterns, variability, as well as sites already catalogued by other studies (e.g. TCGA), so as to design and apply filtering strategies to screen more reliable sites. The package is available for non-commercial users at the URL https://www.itb.cnr.it/isma.

[1]  Benjamin J. Raphael,et al.  Mutational landscape and significance across 12 major cancer types , 2013, Nature.

[2]  Peilin Jia,et al.  Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers , 2013, Genome Medicine.

[3]  Alessandro Pietrelli,et al.  myVCF: a desktop application for high‐throughput mutations data management , 2017, Bioinform..

[4]  Paul Shannon,et al.  VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants , 2014, Bioinform..

[5]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[6]  Niklaus J Grünwald,et al.  vcfr: a package to manipulate and visualize variant call format data in R , 2017, Molecular ecology resources.

[7]  P. A. Futreal,et al.  MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data , 2016, Genome Biology.

[8]  R. Daniel Kortschak,et al.  A comparative analysis of algorithms for somatic SNV detection in cancer , 2013, Bioinform..

[9]  Brandi L. Cantarel,et al.  BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity , 2014, BMC Bioinformatics.

[10]  Lin He,et al.  In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data , 2016, Scientific Reports.

[11]  Bernd Rinn,et al.  NGS-pipe: a flexible, easily extendable and highly configurable framework for NGS analysis , 2017, Bioinform..

[12]  Robert Gentleman,et al.  VariantTools: an extensible framework for developing and testing variant callers , 2017, Bioinform..

[13]  Michael C. Heinold,et al.  A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing , 2015, Nature Communications.

[14]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[15]  Wendy S. W. Wong,et al.  Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs , 2012, Bioinform..

[16]  Mads Thomassen,et al.  Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data , 2016, PloS one.

[17]  K. Tomczak,et al.  The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge , 2015, Contemporary oncology.

[18]  Ken Chen,et al.  SomaticSniper: identification of somatic point mutations in whole genome sequencing data , 2012, Bioinform..

[19]  Gianluca Bontempi,et al.  TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data , 2015, Nucleic acids research.

[20]  S A Forbes,et al.  The Catalogue of Somatic Mutations in Cancer (COSMIC) , 2008, Current protocols in human genetics.

[21]  S. Gabriel,et al.  Discovery and saturation analysis of cancer genes across 21 tumor types , 2014, Nature.

[22]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[23]  Alistair G. Rust,et al.  Cake: a bioinformatics pipeline for the integrated analysis of somatic variants in cancer genomes , 2013, Bioinform..