ISOGO: Functional annotation of protein-coding splice variants

The advent of RNA-seq technologies has switched the paradigm of genetic analysis from a genome to a transcriptome-based perspective. Alternative splicing generates functional diversity in genes, but the precise functions of many individual isoforms are yet to be elucidated. Gene Ontology was developed to annotate gene products according to their biological processes, molecular functions and cellular components. Despite a single gene may have several gene products, most annotations are not isoform-specific and do not distinguish the functions of the different proteins originated from a single gene. Several approaches have tried to automatically annotate ontologies at the isoform level, but this has shown to be a daunting task. We have developed ISOGO (ISOform + GO function imputation), a novel algorithm to predict the function of coding isoforms based on their protein domains and their correlation of expression along 11,373 cancer patients. Combining these two sources of information outperforms previous approaches: it provides an area under precision-recall curve (AUPRC) five times larger than previous attempts and the median AUROC of assigned functions to genes is 0.82. We tested ISOGO predictions on some genes with isoform-specific functions ( BRCA1 , MADD , VAMP7 and ITSN1 ) and they were coherent with the literature. Besides, we examined whether the main isoform of each gene -as predicted by APPRIS- was the most likely to have the annotated gene functions and it occurs in 99.4% of the genes. We also evaluated the predictions for isoform-specific functions provided by the CAFA3 challenge and results were also convincing. To make these results available to the scientific community, we have deployed a web application to consult ISOGO predictions ( https://biotecnun.unav.es/app/isogo ). Initial data, website link, isoform-specific GO function predictions and R code is available at https://gitlab.com/icassol/isogo .

[1]  G. Feierl,et al.  High Prevalence of VanA-Type Vancomycin-Resistant Enterococci in Austrian Poultry , 2005, Applied and Environmental Microbiology.

[2]  Hongdong Li,et al.  Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data , 2013, PLoS Comput. Biol..

[3]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[4]  Weidong Tian,et al.  Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function , 2008, Genome Biology.

[5]  Yu Xia,et al.  Domain-based prediction of the human isoform interactome provides insights into the functional impact of alternative splicing , 2017, PLoS Comput. Biol..

[6]  Michael I. Jordan,et al.  Consistent probabilistic outputs for protein function prediction , 2008, Genome Biology.

[7]  David I. Wilson,et al.  BRCA1 exon 11 alternative splicing, multiple functions and the association with cancer. , 2012, Biochemical Society transactions.

[8]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[9]  Yang Zhang,et al.  Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks. , 2017, Methods in molecular biology.

[10]  Huikun Zhang,et al.  Alternative splicing-derived intersectin1-L and intersectin1-S exert opposite function in glioma progression , 2019, Cell Death & Disease.

[11]  Scott A. Rifkin,et al.  A Gene Expression Map for the Euchromatic Genome of Drosophila melanogaster , 2004, Science.

[12]  Huidong Shi,et al.  Computational Methods and Correlation of Exon-skipping Events with Splicing, Transcription, and Epigenetic Factors. , 2017, Methods in molecular biology.

[13]  A. Muniategui,et al.  EventPointer: an effective identification of alternative splicing events using junction arrays , 2016, BMC Genomics.

[14]  Lloyd M. Smith,et al.  Proteoform: a single term describing protein complexity , 2013, Nature Methods.

[15]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  E. Santero,et al.  Hierarchical management of carbon sources is regulated similarly by the CbrA/B systems in Pseudomonas aeruginosa and Pseudomonas putida. , 2014, Microbiology.

[17]  S. Stamm,et al.  Function of alternative splicing. , 2013, Gene.

[18]  Joseph K. Pickrell,et al.  Noisy Splicing Drives mRNA Isoform Diversity in Human Cells , 2010, PLoS genetics.

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  Mamoru Fukuda,et al.  Down-regulation of BRCA1-BARD1 ubiquitin ligase by CDK2. , 2005, Cancer research.

[21]  D. Bates,et al.  Hallmarks of alternative splicing in cancer , 2014, Oncogene.

[22]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[23]  W. Kim,et al.  Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy , 2008, Genome Biology.

[24]  Babak Shahbaba,et al.  Gene function classification using Bayesian models with hierarchy-based priors , 2006, BMC Bioinformatics.

[25]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[26]  D. Black Mechanisms of alternative pre-messenger RNA splicing. , 2003, Annual review of biochemistry.

[27]  Alfonso Valencia,et al.  APPRIS: annotation of principal and alternative splice isoforms , 2012, Nucleic Acids Res..

[28]  Limsoon Wong,et al.  An efficient strategy for extensive integration of diverse biological data for protein function prediction , 2007, Bioinform..

[29]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[30]  Yan Liu,et al.  High-resolution functional annotation of human transcriptome: predicting isoform functions by a novel multiple instance-based label propagation method , 2013, Nucleic acids research.

[31]  P. Hainaut,et al.  p53 isoforms - A conspiracy to kidnap p53 tumor suppressor activity? , 2009, Cellular and Molecular Life Sciences.

[32]  J. Manley,et al.  BRCA1/BARD1 inhibition of mRNA 3' processing involves targeted degradation of RNA polymerase II. , 2005, Genes & development.

[33]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[34]  Giorgio Valentini,et al.  Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods , 2017, BMC Bioinformatics.

[35]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[36]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[37]  Stephen R. Piccolo,et al.  A cloud-based workflow to quantify transcript-expression levels in public cancer compendia , 2016, Scientific Reports.

[38]  Yinghui Li,et al.  Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration , 2006, BMC Bioinformatics.

[39]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[40]  J. Manley,et al.  The BARD1-CstF-50 Interaction Links mRNA 3′ End Formation to DNA Damage and Tumor Suppression , 2001, Cell.

[41]  Angel Rubio,et al.  Correlation between Gene Expression and GO Semantic Similarity , 2005, TCBB.

[42]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Kenshi Hayashi,et al.  Characterization of caspase-8L: a novel isoform of caspase-8 that behaves as an inhibitor of the caspase cascade. , 2002, Blood.

[44]  C. Hourigan,et al.  Current Approaches in the Treatment of Relapsed and Refractory Acute Myeloid Leukemia , 2015, Journal of clinical medicine.

[45]  Peter A Horn,et al.  Pluripotency and reprogramming: meeting report on the Fifth International Meeting of the Stem Cell Network North Rhine Westphalia. , 2009, Cloning and stem cells.

[46]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[47]  B. Hallström,et al.  The Urinary Bladder Transcriptome and Proteome Defined by Transcriptomics and Antibody-Based Profiling , 2015, PloS one.

[48]  A. Loffreda,et al.  RNA Splicing: A New Player in the DNA Damage Response , 2013, International journal of cell biology.

[49]  G. von Heijne,et al.  Tissue-based map of the human proteome , 2015, Science.

[50]  Yuanfang Guan,et al.  Genome-Wide Functional Annotation of Human Protein-Coding Splice Variants Using Multiple Instance Learning. , 2016, Journal of proteome research.

[51]  Xuegong Zhang,et al.  Characterization of kinase gene expression and splicing profile in prostate cancer with RNA-Seq data , 2016, BMC Genomics.

[52]  O. Troyanskaya,et al.  Predicting gene function in a hierarchical context with an ensemble of classifiers , 2008, Genome Biology.

[53]  Eunice Lee,et al.  How alternative splicing affects membrane-trafficking dynamics , 2018, Journal of Cell Science.

[54]  Boris Hayete,et al.  GOTrees: Predicting GO Associations from Protein Domain Composition Using Decision Trees , 2004, Pacific Symposium on Biocomputing.

[55]  H. Scott,et al.  Splice factor mutations and alternative splicing as drivers of hematopoietic malignancy , 2015, Immunological reviews.

[56]  Qi Zhou,et al.  Aberrant splicing and drug resistance in AML , 2016, Journal of Hematology & Oncology.

[57]  D. Predescu,et al.  Intersectin-1s Regulates the Mitochondrial Apoptotic Pathway in Endothelial Cells* , 2007, Journal of Biological Chemistry.

[58]  T. Tuschl,et al.  Specific RNAi Mediated Gene Knockdown in Zebrafish Cell Lines , 2005, RNA biology.

[59]  M. Tress,et al.  Alternative Splicing May Not Be the Key to Proteome Complexity. , 2017, Trends in biochemical sciences.

[60]  J. Castle,et al.  Genome-Wide Survey of Human Alternative Pre-mRNA Splicing with Exon Junction Microarrays , 2003, Science.

[61]  J. Knight,et al.  Distinct Transcriptional and Anti-Mycobacterial Profiles of Peripheral Blood Monocytes Dependent on the Ratio of Monocytes: Lymphocytes , 2015, EBioMedicine.

[62]  Jeffrey L. Wrana,et al.  An Alternative Splicing Switch Regulates Embryonic Stem Cell Pluripotency and Reprogramming , 2011, Cell.

[63]  J. Harrow,et al.  Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene , 2013, Genome Biology.