ITD assembler: an algorithm for internal tandem duplication discovery from short-read sequencing data

BackgroundDetection of tandem duplication within coding exons, referred to as internal tandem duplication (ITD), remains challenging due to inefficiencies in alignment of ITD-containing reads to the reference genome. There is a critical need to develop efficient methods to recover these important mutational events.ResultsIn this paper we introduce ITD Assembler, a novel approach that rapidly evaluates all unmapped and partially mapped reads from whole exome NGS data using a De Bruijn graphs approach to select reads that harbor cycles of appropriate length, followed by assembly using overlap-layout-consensus. We tested ITD Assembler on The Cancer Genome Atlas AML dataset as a truth set. ITD Assembler identified the highest percentage of reported FLT3-ITDs when compared to other ITD detection algorithms, and discovered additional ITDs in FLT3, KIT, CEBPA, WT1 and other genes. Evidence of polymorphic ITDs in 54 genes were also found. Novel ITDs were validated by analyzing the corresponding RNA sequencing data.ConclusionsITD Assembler is a very sensitive tool which can detect partial, large and complex tandem duplications. This study highlights the need to more effectively look for ITD’s in other cancers and Mendelian diseases.

[1]  T. Ikezoe,et al.  CD34+/CD38− acute myelogenous leukemia cells aberrantly express Aurora kinase A , 2011, International journal of cancer.

[2]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[3]  Selim Corbacioglu,et al.  Newly identified c-KIT receptor tyrosine kinase ITD in childhood AML induces ligand-independent growth and is responsive to a synergistic effect of imatinib and rapamycin. , 2006, Blood.

[4]  Phil Green,et al.  Whole-genome disassembly , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Adam Bagg,et al.  Molecular diagnostics of acute myeloid leukemia: it's a (next) generational thing. , 2013, The Journal of molecular diagnostics : JMD.

[6]  Heng Li,et al.  FermiKit: assembly-based variant calling for Illumina resequencing data , 2015, Bioinform..

[7]  Benjamin J. Raphael,et al.  Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. , 2013, The New England journal of medicine.

[8]  Adam J Mead,et al.  The impact of FLT3 internal tandem duplication mutant level, number, size, and interaction with NPM1 mutations in a large cohort of young adult patients with acute myeloid leukemia. , 2008, Blood.

[9]  Aaron R. Quinlan,et al.  BamTools: a C++ API and toolkit for analyzing and managing BAM files , 2011, Bioinform..

[10]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[11]  E. Eichler,et al.  Limitations of next-generation genome sequence assembly , 2011, Nature Methods.

[12]  Knut Reinert,et al.  Methods for the detection and assembly of novel sequence in high-throughput sequencing data , 2015, Bioinform..

[13]  John D Pfeifer,et al.  Detection of FLT3 internal tandem duplication in targeted, short-read-length, next-generation sequencing data. , 2013, The Journal of molecular diagnostics : JMD.

[14]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[15]  Adam J Mead,et al.  Conflicting data on the prognostic significance of FLT3/TKD mutations in acute myeloid leukemia might be related to the incidence of biallelic disease. , 2008, Blood.

[16]  Rayan Chikhi,et al.  MindTheGap: integrated detection and assembly of short and long insertions , 2014, Bioinform..

[17]  Erdogan Taskesen,et al.  Prognostic impact, concurrent genetic mutations, and gene expression features of AML with CEBPA mutations in a cohort of 1182 cytogenetically normal AML patients: further evidence for CEBPA double mutant AML as a distinctive disease entity. , 2011, Blood.

[18]  Wolfgang Hiddemann,et al.  FLT3-ITD-TKD dual mutants associated with AML confer resistance to FLT3 PTK inhibitors and cytotoxic agents by overexpression of Bcl-x(L). , 2005, Blood.

[19]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[20]  Satoru Miyano,et al.  Genomon ITDetector: a tool for somatic internal tandem duplication detection from cancer genome sequencing data , 2015, Bioinform..

[21]  M. Schatz,et al.  Assembly of large genomes using second-generation sequencing. , 2010, Genome research.