Misidentification of MLL3 and other mutations in cancer due to highly homologous genomic regions

Abstract The MLL3 gene has been shown to be recurrently mutated in many malignancies including in families with acute myeloid leukemia. We demonstrate that many MLL3 variant calls made by exome sequencing are false positives due to misalignment to homologous regions, including a region on chr21, and can only be validated by long-range PCR. Numerous other recurrently mutated genes reported in COSMIC and TCGA databases have pseudogenes and cannot also be validated by conventional short read-based sequencing approaches. Genome-wide identification of pseudogene regions demonstrates that frequency of these homologous regions is increased with sequencing read lengths below 200 bps. To enable identification of poor quality sequencing variants in prospective studies, we generated novel genome-wide maps of regions with poor mappability that can be used in variant calling algorithms. Taken together, our findings reveal that pseudogene regions are a source of false-positive mutations in cancers

[1]  S. Miyano,et al.  Profiling of somatic mutations in acute myeloid leukemia with FLT3-ITD at diagnosis and relapse. , 2015, Blood.

[2]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[3]  R. Gibbs,et al.  Mutational Landscape of Aggressive Cutaneous Squamous Cell Carcinoma , 2014, Clinical Cancer Research.

[4]  Ethan Cerami,et al.  Genomic analyses of gynaecologic carcinosarcomas reveal frequent mutations in chromatin remodelling genes , 2014, Nature Communications.

[5]  Zhen Zhao,et al.  MLL3 is a haploinsufficient 7q tumor suppressor in acute myeloid leukemia. , 2014, Cancer cell.

[6]  T. Druley,et al.  Excess congenital non-synonymous variation in leukemia-associated genes in MLL− infant leukemia: a Children's Oncology Group report , 2013, Leukemia.

[7]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[8]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[9]  Shuangnian Xu,et al.  Exome sequencing identifies an MLL3 gene germ line mutation in a pedigree of colorectal cancer and acute myeloid leukemia. , 2013, Blood.

[10]  Julia C. Engelmann,et al.  Commonly altered genomic regions in acute myeloid leukemia are enriched for somatic mutations involved in chromatin remodeling and splicing. , 2012, Blood.

[11]  David G. Knowles,et al.  Fast Computation and Applications of Genome Mappability , 2012, PloS one.

[12]  Joshua F. McMichael,et al.  DNMT3A mutations in acute myeloid leukemia. , 2010, The New England journal of medicine.

[13]  J. Downing,et al.  High-resolution genomic profiling of adult and pediatric core-binding factor acute myeloid leukemia reveals new recurrent genomic alterations. , 2010, Blood.

[14]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[15]  Ken Chen,et al.  Recurring mutations found by sequencing an acute myeloid leukemia genome. , 2009, The New England journal of medicine.

[16]  Ken Chen,et al.  VarScan: variant detection in massively parallel sequencing of individual and pooled samples , 2009, Bioinform..

[17]  S A Forbes,et al.  The Catalogue of Somatic Mutations in Cancer (COSMIC) , 2008, Current protocols in human genetics.

[18]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..