Uncovering and characterizing splice variants associated with survival in lung cancer patients

Splice variants have been shown to play an important role in tumor initiation and progression and can serve as novel cancer biomarkers. However, the clinical importance of individual splice variants and the mechanisms by which they can perturb cellular functions are still poorly understood. To address these issues, we developed an efficient and robust computational method to: (1) identify splice variants that are associated with patient survival in a statistically significant manner; and (2) predict rewired protein-protein interactions that may result from altered patterns of expression of such variants. We applied our method to the lung adenocarcinoma dataset from TCGA and identified splice variants that are significantly associated with patient survival and can alter protein-protein interactions. Among these variants, several are implicated in DNA repair through homologous recombination. To computationally validate our findings, we characterized the mutational signatures in patients, grouped by low and high expression of a splice variant associated with patient survival and involved in DNA repair. The results of the mutational signature analysis are in agreement with the molecular mechanism suggested by our method. To the best of our knowledge, this is the first attempt to build a computational approach to systematically identify splice variants associated with patient survival that can also generate experimentally testable, mechanistic hypotheses. Code for identifying survival-significant splice variants using the Null Empirically Estimated P-value method can be found at https://github.com/thecodingdoc/neep. Code for construction of Multi-Granularity Graphs to discover potential rewired protein interactions can be found at https://github.com/scwest/SINBAD. Presentation slides are found at https://github.com/scwest/RECOMB-CBB_2019_NEEP. Author summary In spite of many recent breakthroughs, there is still a pressing need for better ways to diagnose and treat cancer in ways that are specific to the unique biology of the disease. Novel computational methods applied to large-scale datasets can help us reach this goal more effectively. In this work we shed light on a still poorly understood biological process that is often aberrant in cancer and that can lead to tumor formation, progression, and invasion. This mechanism is alternative splicing and is the ability of one gene to code for many different variants with distinct functions. We developed a fast and statistically robust approach to identify splice variants that are significantly associated with patient survival. Then, we computationally characterized the protein products of these splice variants by identifying potential losses and gains of protein interactions that could explain their biological role in cancer. We applied our method to a lung adenocarcinoma dataset and identified several splice variants associated with patient survival that lose biologically important interactions. We conducted case studies and computationally validated some of our results by finding mutation signatures that support the molecular mechanism suggested by our method.

[1]  S. West,et al.  Role of RAD51C and XRCC3 in Genetic Recombination and DNA Repair* , 2007, Journal of Biological Chemistry.

[2]  S. Eddy,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[3]  Valerio Costa,et al.  Pan-Cancer Mutational and Transcriptional Analysis of the Integrator Complex , 2017, International journal of molecular sciences.

[4]  Mingming Jia,et al.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer , 2010, Nucleic Acids Res..

[5]  Geet Duggal,et al.  Salmon: Accurate, Versatile and Ultrafast Quantification from RNA-seq Data using Lightweight-Alignment , 2015 .

[6]  Edwin Cuppen,et al.  MutationalPatterns: comprehensive genome-wide analysis of mutational processes , 2016, Genome Medicine.

[7]  Robert D. Finn,et al.  HMMER web server: 2015 update , 2015, Nucleic Acids Res..

[8]  Shanrong Zhao,et al.  Evaluation and comparison of computational tools for RNA-seq isoform quantification , 2017, BMC Genomics.

[9]  Youwei Zhang,et al.  Roles of Chk1 in cell biology and cancer therapy , 2014, International journal of cancer.

[10]  Sean R. Eddy,et al.  Hidden Markov model speed heuristic and iterative HMM search procedure , 2010, BMC Bioinformatics.

[11]  Robert Azencott,et al.  Robust Selection Algorithm (RSA) for Multi-Omic Biomarker Discovery; Integration with Functional Network Analysis to Identify miRNA Regulated Pathways in Multiple Cancers , 2015, PloS one.

[12]  M. Robinson,et al.  Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. , 2015, F1000Research.

[13]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[14]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[15]  D. Haines,et al.  Loss of Rad51c leads to embryonic lethality and modulation of Trp53-dependent tumorigenesis in mice. , 2009, Cancer research.

[16]  D. Auble,et al.  The Rad23 ubiquitin receptor, the proteasome and functional specificity in transcriptional control , 2010, Transcription.

[17]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[18]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[19]  Kara Dolinski,et al.  The BioGRID interaction database: 2017 update , 2016, Nucleic Acids Res..

[20]  Cole Trapnell,et al.  Role of Rodent Secondary Motor Cortex in Value-based Action Selection Nih Public Access Author Manuscript , 2006 .

[21]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[22]  Adam Godzik,et al.  The Functional Impact of Alternative Splicing in Cancer. , 2017, Cell reports.

[23]  Chern Ein Oon,et al.  Molecular targeted therapy: Treating cancer with specificity , 2018, European journal of pharmacology.

[24]  Eduardo Eyras,et al.  Detection of recurrent alternative splicing switches in tumor samples reveals novel signatures of cancer , 2015, Nucleic acids research.

[25]  A. Jemal,et al.  Cancer statistics, 2017 , 2017, CA: a cancer journal for clinicians.

[26]  Richard D. Wood,et al.  Human DNA helicase HELQ participates in DNA interstrand crosslink tolerance with ATR and RAD51 paralogs , 2013, Nature Communications.

[27]  Heli Nevanlinna,et al.  Screening of HELQ in breast and ovarian cancer families , 2015, Familial Cancer.

[28]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[29]  Berthold Lausen,et al.  Maximally selected rank statistics , 1992 .

[30]  Ken Chen,et al.  SomaticSniper: identification of somatic point mutations in whole genome sequencing data , 2012, Bioinform..

[31]  Omar Abdel-Wahab,et al.  Therapeutic targeting of splicing in cancer , 2016, Nature Medicine.

[32]  Didier Auboeuf,et al.  Splicing Programs and Cancer , 2011, Journal of nucleic acids.

[33]  Y. Xing,et al.  SURVIV for survival analysis of mRNA isoform variation , 2016, Nature Communications.

[34]  M. Robinson,et al.  Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences , 2015, F1000Research.

[35]  P. A. Futreal,et al.  MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data , 2016, Genome Biology.

[36]  Anup Mishra,et al.  RAD51C/XRCC3 Facilitates Mitochondrial DNA Replication and Maintains Integrity of the Mitochondrial Genome , 2017, Molecular and Cellular Biology.

[37]  Ishwor Thapa,et al.  FunSet: an open-source software and web server for performing and displaying Gene Ontology enrichment analysis , 2019, BMC Bioinformatics.

[38]  D. Bates,et al.  Hallmarks of alternative splicing in cancer , 2014, Oncogene.

[39]  In Seok Yang,et al.  ISOexpresso: a web-based platform for isoform-level expression analysis in human cancer , 2016, BMC genomics.

[40]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[41]  M. Masucci,et al.  The UBA2 domain functions as an intrinsic stabilization signal that protects Rad23 from proteasomal degradation. , 2005, Molecular cell.

[42]  Arnaud Céol,et al.  3did: a catalog of domain-based interactions of known three-dimensional structure , 2013, Nucleic Acids Res..

[43]  Michael R Hamblin,et al.  CA : A Cancer Journal for Clinicians , 2011 .

[44]  Zhandong Liu,et al.  Comprehensive evaluation of RNA-seq quantification methods for linearity , 2016, BMC Bioinformatics.

[45]  Eric J Wagner,et al.  Integrator: surprisingly diverse functions in gene expression. , 2015, Trends in biochemical sciences.