Indel detection from DNA and RNA sequencing data with transIndel

BackgroundInsertions and deletions (indels) are a major class of genomic variation associated with human disease. Indels are primarily detected from DNA sequencing (DNA-seq) data but their transcriptional consequences remain unexplored due to challenges in discriminating medium-sized and large indels from splicing events in RNA-seq data.ResultsHere, we developed transIndel, a splice-aware algorithm that parses the chimeric alignments predicted by a short read aligner and reconstructs the mid-sized insertions and large deletions based on the linear alignments of split reads from DNA-seq or RNA-seq data. TransIndel exhibits competitive or superior performance over eight state-of-the-art indel detection tools on benchmarks using both synthetic and real DNA-seq data. Additionally, we applied transIndel to DNA-seq and RNA-seq datasets from 333 primary prostate cancer patients from The Cancer Genome Atlas (TCGA) and 59 metastatic prostate cancer patients from AACR-PCF Stand-Up- To-Cancer (SU2C) studies. TransIndel enhanced the taxonomy of DNA- and RNA-level alterations in prostate cancer by identifying recurrent FOXA1 indels as well as exitron splicing in genes implicated in disease progression.ConclusionsOur study demonstrates that transIndel is a robust tool for elucidation of medium- and large-sized indels from DNA-seq and RNA-seq data. Including RNA-seq in indel discovery efforts leads to significant improvements in sensitivity for identification of med-sized and large indels missed by DNA-seq, and reveals non-canonical RNA-splicing events in genes associated with disease pathology.

[1]  Heng Li,et al.  FermiKit: assembly-based variant calling for Illumina resequencing data , 2015, Bioinform..

[2]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[3]  Ping Yang,et al.  Indel detection from RNA-seq data: tool evaluation and strategies for accurate detection of actionable mutations , 2016, Briefings Bioinform..

[4]  Christopher R. Cabanski,et al.  Integrated RNA and DNA sequencing improves mutation detection in low purity tumors , 2014, Nucleic acids research.

[5]  Yamile Marquez,et al.  Unmasking alternative splicing inside protein-coding exons defines exitrons and their role in proteome plasticity , 2015, Genome research.

[6]  G. McVean,et al.  Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications , 2014, Nature Genetics.

[7]  Lawrence D. True,et al.  Integrative Clinical Genomics of Advanced Prostate Cancer , 2015, Cell.

[8]  David Haussler,et al.  UCSC Data Integrator and Variant Annotation Integrator , 2016, Bioinform..

[9]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[10]  M. Pertea,et al.  The Human Transcriptome: An Unfinished Story , 2012, Genes.

[11]  Sean R. Landman,et al.  Truncation and constitutive activation of the androgen receptor by diverse genomic rearrangements in prostate cancer , 2016, Nature Communications.

[12]  L. Ding,et al.  novoBreak: local assembly for breakpoint detection in cancer genomes , 2016, Nature Methods.

[13]  Ravi Vijaya Satya,et al.  Comparison of somatic mutation calling methods in amplicon and whole exome sequence data , 2014, BMC Genomics.

[14]  J. Biegel,et al.  ZNF238 is expressed in postmitotic brain cells and inhibits brain tumor growth. , 2010, Cancer research.

[15]  A. Børresen-Dale,et al.  Direct Transcriptional Consequences of Somatic Mutation in Breast Cancer , 2016, Cell reports.

[16]  Jin Billy Li,et al.  Reliable identification of genomic variants from RNA-seq data. , 2013, American journal of human genetics.

[17]  F. Passetti,et al.  Using high-throughput sequencing transcriptome data for INDEL detection: challenges for cancer drug discovery , 2016, Expert opinion on drug discovery.

[18]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[19]  Faraz Hach,et al.  Comrad: detection of expressed rearrangements by integrated analysis of RNA-Seq and low coverage genome sequence data , 2011, Bioinform..

[20]  R. Wilson,et al.  INTEGRATE: gene fusion discovery using whole genome and transcriptome data , 2016, Genome research.

[21]  Junfeng Xia,et al.  Inconsistency and features of single nucleotide variants detected in whole exome sequencing versus transcriptome sequencing: A case study in lung cancer. , 2015, Methods.

[22]  Enter exitrons , 2015, Genome Biology.

[23]  Rendong Yang,et al.  ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly , 2015, Genome Medicine.

[24]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[25]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[26]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[27]  O. Hofmann,et al.  VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research , 2016, Nucleic acids research.

[28]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[29]  M. Schatz,et al.  Accurate detection of de novo and transmitted indels within exome-capture data using micro-assembly , 2014, Nature Methods.

[30]  S. Dhanasekaran,et al.  Targeting the MLL complex in castration resistant prostate cancer , 2015, Nature Medicine.

[31]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[32]  Joshua M. Stuart,et al.  RADIA: RNA and DNA Integrated Analysis for Somatic Mutation Detection , 2014, PloS one.

[33]  Steven J. M. Jones,et al.  The Molecular Taxonomy of Primary Prostate Cancer , 2015, Cell.

[34]  H. Drexler,et al.  Treatment of Mycoplasma Contamination in Cell Cultures with Plasmocin , 2012, Journal of biomedicine & biotechnology.