Comprehensive Analysis of Transcriptome Variation Uncovers Known and Novel Driver Events in T-Cell Acute Lymphoblastic Leukemia

RNA-seq is a promising technology to re-sequence protein coding genes for the identification of single nucleotide variants (SNV), while simultaneously obtaining information on structural variations and gene expression perturbations. We asked whether RNA-seq is suitable for the detection of driver mutations in T-cell acute lymphoblastic leukemia (T-ALL). These leukemias are caused by a combination of gene fusions, over-expression of transcription factors and cooperative point mutations in oncogenes and tumor suppressor genes. We analyzed 31 T-ALL patient samples and 18 T-ALL cell lines by high-coverage paired-end RNA-seq. First, we optimized the detection of SNVs in RNA-seq data by comparing the results with exome re-sequencing data. We identified known driver genes with recurrent protein altering variations, as well as several new candidates including H3F3A, PTK2B, and STAT5B. Next, we determined accurate gene expression levels from the RNA-seq data through normalizations and batch effect removal, and used these to classify patients into T-ALL subtypes. Finally, we detected gene fusions, of which several can explain the over-expression of key driver genes such as TLX1, PLAG1, LMO1, or NKX2-1; and others result in novel fusion transcripts encoding activated kinases (SSBP2-FER and TPM3-JAK2) or involving MLLT10. In conclusion, we present novel analysis pipelines for variant calling, variant filtering, and expression normalization on RNA-seq data, and successfully applied these for the detection of translocations, point mutations, INDELs, exon-skipping events, and expression perturbations in T-ALL.

[1]  B. Göttgens,et al.  Bivalent promoter marks and a latent enhancer may prime the leukaemia oncogene LMO1 for ectopic expression in T-cell leukaemia , 2013, Leukemia.

[2]  Tyson A. Clark,et al.  Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array , 2006, BMC Genomics.

[3]  T. Waldmann,et al.  FERM domain mutations induce gain of function in JAK3 in adult T-cell leukemia/lymphoma. , 2011, Blood.

[4]  Andrew P. Weng,et al.  Activating Mutations of NOTCH1 in Human T Cell Acute Lymphoblastic Leukemia , 2004, Science.

[5]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[6]  E. Clappier,et al.  Somatically acquired JAK1 mutations in adult acute lymphoblastic leukemia , 2008, The Journal of experimental medicine.

[7]  Lee T. Sam,et al.  Transcriptome Sequencing to Detect Gene Fusions in Cancer , 2009, Nature.

[8]  Stein Aerts,et al.  Exome sequencing identifies mutation in CNOT3 and ribosomal genes RPL5 and RPL10 in T-cell acute lymphoblastic leukemia , 2012, Nature Genetics.

[9]  Paul Pavlidis,et al.  Assessing identity, redundancy and confounds in Gene Ontology annotations over time , 2013, Bioinform..

[10]  A. Ferrando,et al.  Oncogenic IL7R gain-of-function mutations in childhood T-cell acute lymphoblastic leukemia , 2011, Nature Genetics.

[11]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[12]  Lior Pachter,et al.  Identification of novel transcripts in annotated genomes using RNA-Seq , 2011, Bioinform..

[13]  Xuexiu Zheng,et al.  Chimeric RNAs as potential biomarkers for tumor diagnosis. , 2012, BMB reports.

[14]  A. Ferrando,et al.  The molecular basis of T cell acute lymphoblastic leukemia. , 2012, The Journal of clinical investigation.

[15]  M. Gill,et al.  Development of Strategies for SNP Detection in RNA-Seq Data: Application to Lymphoblastoid Cell Lines and Evaluation Using 1000 Genomes Data , 2013, PloS one.

[16]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[17]  A Orfao,et al.  Proposals for the immunological classification of acute leukemias. European Group for the Immunological Characterization of Leukemias (EGIL). , 1995, Leukemia.

[18]  Thomas D. Wu,et al.  Genome and transcriptome sequencing of lung cancers reveal diverse mutational and splicing events , 2012, Genome research.

[19]  K. Khanna,et al.  Mutant p53 drives multinucleation and invasion through a process that is suppressed by ANKRD11 , 2012, Oncogene.

[20]  Süleyman Cenk Sahinalp,et al.  deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data , 2011, PLoS Comput. Biol..

[21]  S. Fan,et al.  Proline-Rich Tyrosine Kinase 2 (Pyk2) Promotes Cell Motility of Hepatocellular Carcinoma through Induction of Epithelial to Mesenchymal Transition , 2011, PloS one.

[22]  R. Pieters,et al.  Biology and treatment of acute lymphoblastic leukemia. , 2008, Pediatric clinics of North America.

[23]  Peter Marynen,et al.  Fusion of EML1 to ABL1 in T-cell acute lymphoblastic leukemia with cryptic t(9;14)(q34;q32). , 2005, Blood.

[24]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[25]  A. Ferrando,et al.  Fusion of NUP214 to ABL1 on amplified episomes in T-cell acute lymphoblastic leukemia , 2004, Nature Genetics.

[26]  Kevin P. Murphy,et al.  SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors , 2010, Bioinform..

[27]  Wei Keat Lim,et al.  The transcriptional network for mesenchymal transformation of brain tumors , 2009, Nature.

[28]  Stein Aerts,et al.  High Accuracy Mutation Detection in Leukemia on a Selected Panel of Cancer Genes , 2012, PloS one.

[29]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[30]  Yuchen Jiao,et al.  Mutations in CIC and FUBP1 Contribute to Human Oligodendroglioma , 2011, Science.

[31]  Jiong Hu,et al.  NOTCH1 Mutations in T-Cell Acute Lymphoblastic Leukemia: Prognostic Significance and Implication in Multifactorial Leukemogenesis , 2006, Clinical Cancer Research.

[32]  Steven J. M. Jones,et al.  Frequent mutation of histone modifying genes in non-Hodgkin lymphoma , 2011, Nature.

[33]  Kiran C. Bobba,et al.  The genetic basis of early T-cell precursor acute lymphoblastic leukaemia , 2012, Nature.

[34]  B. Nadel,et al.  Extensive molecular mapping of TCRα/δ- and TCRβ-involved chromosomal translocations reveals distinct mechanisms of oncogene activation in T-ALL. , 2012, Blood.

[35]  A. Hagemeijer,et al.  Cytogenetics and molecular genetics of T-cell acute lymphoblastic leukemia: from thymocyte to lymphoblast , 2006, Leukemia.

[36]  Li Ding,et al.  Somatic Histone H3 Alterations in Paediatric Diffuse Intrinsic Pontine Gliomas and Non-Brainstem Glioblastomas , 2012, Nature Genetics.

[37]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[38]  Rob Pieters,et al.  PHF6 mutations in T-cell acute lymphoblastic leukemia , 2010, Nature Genetics.

[39]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[40]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[41]  E. Lander,et al.  Gene expression signatures define novel oncogenic pathways in T cell acute lymphoblastic leukemia. , 2002, Cancer cell.

[42]  Charles Lee,et al.  Alu elements mediate MYB gene tandem duplication in human T-ALL , 2007, The Journal of experimental medicine.

[43]  M. Amylon,et al.  Does activation of the TAL1 gene occur in a majority of patients with T-cell acute lymphoblastic leukemia? A pediatric oncology group study. , 1995, Blood.

[44]  J. Soulier,et al.  Mutation of the receptor tyrosine phosphatase PTPRC (CD45) in T-cell acute lymphoblastic leukemia. , 2012, Blood.

[45]  W. Vainchenker,et al.  JAK/STAT signaling in hematological malignancies , 2013, Oncogene.

[46]  Cole Trapnell,et al.  Computational methods for transcriptome annotation and quantification using RNA-seq , 2011, Nature Methods.

[47]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[48]  David T. W. Jones,et al.  Driver mutations in histone H3.3 and chromatin remodelling genes in paediatric glioblastoma , 2012, Nature.

[49]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[50]  Hao Yuan Kueh,et al.  Regulatory gene network circuits underlying T cell development from multipotent progenitors , 2012, Wiley interdisciplinary reviews. Systems biology and medicine.

[51]  J. Pringle,et al.  Expression of tenascin-C and its isoforms in the breast , 2010, Cancer and Metastasis Reviews.

[52]  Daniel Rios,et al.  Bioinformatics Applications Note Databases and Ontologies Deriving the Consequences of Genomic Variants with the Ensembl Api and Snp Effect Predictor , 2022 .

[53]  Sandrine Dudoit,et al.  GC-Content Normalization for RNA-Seq Data , 2011, BMC Bioinformatics.

[54]  Rob Pieters,et al.  Duplication of the MYB oncogene in T cell acute lymphoblastic leukemia , 2007, Nature Genetics.

[55]  Thomas D. Wu,et al.  Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples , 2011, BMC Medical Genomics.

[56]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[57]  F. Speleman,et al.  Molecular cytogenetic study of 126 unselected T-ALL cases reveals high incidence of TCRβ locus rearrangements and putative new T-cell oncogenes , 2006, Leukemia.

[58]  S. Gabriel,et al.  Advances in understanding cancer genomes through second-generation sequencing , 2010, Nature Reviews Genetics.

[59]  C. Mecucci,et al.  New MLLT10 gene recombinations in pediatric T-acute lymphoblastic leukemia. , 2013, Blood.

[60]  Yoo Jin Jung,et al.  The transcriptional landscape and mutational profile of lung adenocarcinoma , 2012, Genome research.

[61]  F. Sigaux,et al.  HOXA genes are included in genetic and biologic networks defining human acute T-cell leukemia (T-ALL). , 2005, Blood.

[62]  Anne-Mette K. Hein,et al.  Alternative Splicing in Colon, Bladder, and Prostate Cancer Identified by Exon Array Analysis*S , 2008, Molecular & Cellular Proteomics.

[63]  L. Feuk,et al.  Exome RNA sequencing reveals rare and novel alternative transcripts , 2012, Nucleic acids research.

[64]  A. Hall,et al.  A comprehensive analysis of the CDKN2A gene in childhood acute lymphoblastic leukemia reveals genomic deletion, copy number neutral loss of heterozygosity, and association with specific cytogenetic subgroups. , 2009, Blood.

[65]  J. Maguire,et al.  Integrative analysis of the melanoma transcriptome. , 2010, Genome research.

[66]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[67]  J. Meijerink,et al.  Cooperative genetic defects in TLX3 rearranged pediatric T-ALL , 2008, Leukemia.

[68]  S. Benitah,et al.  From oncogene to tumor suppressor , 2012, Cell cycle.

[69]  Li Yang,et al.  The difficult calls in RNA editing , 2012, Nature Biotechnology.

[70]  Vassilios Ioannidis,et al.  ExPASy: SIB bioinformatics resource portal , 2012, Nucleic Acids Res..

[71]  Robert B. Hartlage,et al.  This PDF file includes: Materials and Methods , 2009 .

[72]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[73]  Ken Chen,et al.  SomaticSniper: identification of somatic point mutations in whole genome sequencing data , 2012, Bioinform..

[74]  Irmtraud M. Meyer,et al.  The clonal and mutational evolution spectrum of primary triple-negative breast cancers , 2012, Nature.

[75]  Sungjoon Kim,et al.  Ba/F3 cells and their use in kinase drug discovery , 2007, Current opinion in oncology.

[76]  M. Heinrich,et al.  Newly described activating JAK3 mutations in T-cell acute lymphoblastic leukemia , 2012, Leukemia.

[77]  David T. W. Jones,et al.  Hotspot mutations in H3F3A and IDH1 define distinct epigenetic and biological subgroups of glioblastoma. , 2012, Cancer cell.

[78]  S. Mustjoki,et al.  Discovery of somatic STAT5b mutations in large granular lymphocytic leukemia. , 2013, Blood.

[79]  Peer Bork,et al.  SMART 7: recent updates to the protein domain annotation resource , 2011, Nucleic Acids Res..

[80]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[81]  S. Fan,et al.  Proline-rich tyrosine kinase 2 (Pyk2) promotes proliferation and invasiveness of hepatocellular carcinoma cells through c-Src/ERK activation. , 2008, Carcinogenesis.

[82]  Steven J. M. Jones,et al.  MHC class II transactivator CIITA is a recurrent gene fusion partner in lymphoid cancers , 2011, Nature.

[83]  J Schultz,et al.  SMART, a simple modular architecture research tool: identification of signaling domains. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[84]  S A Forbes,et al.  The Catalogue of Somatic Mutations in Cancer (COSMIC) , 2008, Current protocols in human genetics.

[85]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[86]  H. Kulkarni,et al.  Association of HADHA expression with the risk of breast cancer: targeted subset analysis and meta-analysis of microarray data , 2012, BMC Research Notes.

[87]  A. Takaoka,et al.  Pyk2 is a downstream mediator of the IL-2 receptor-coupled Jak signaling pathway. , 1998, Genes & development.

[88]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.