Revisiting the identification of canonical splice isoforms through integration of functional genomics and proteomics evidence

Canonical isoforms in different databases have been defined as the most prevalent, most conserved, most expressed, longest, or the one with the clearest description of domains or posttranslational modifications. In this article, we revisit these definitions of canonical isoforms based on functional genomics and proteomics evidence, focusing on mouse data. We report a novel functional relationship network‐based approach for identifying the highest connected isoforms (HCIs). We show that 46% of these HCIs are not the longest transcripts. In addition, this approach revealed many genes that have more than one highly connected isoforms. Averaged across 175 RNA‐seq datasets covering diverse tissues and conditions, 65% of the HCIs show higher expression levels than nonhighest connected isoforms at the transcript level. At the protein level, these HCIs highly overlap with the expressed splice variants, based on proteomic data from eight different normal tissues. These results suggest that a more confident definition of canonical isoforms can be made through integration of multiple lines of evidence, including HCIs defined by biological processes and pathways, expression prevalence at the transcript level, and relative or absolute abundance at the protein level. This integrative proteogenomics approach can successfully identify principal isoforms that are responsible for the canonical functions of genes.

[1]  Y. Guan,et al.  The emerging era of genomic data integration for analyzing splice isoform function. , 2014, Trends in genetics : TIG.

[2]  Yuanfang Guan,et al.  A new class of protein cancer biomarker candidates: differentially expressed splice variants of ERBB2 (HER2/neu) and ERBB1 (EGFR) in breast cancer cell lines. , 2014, Journal of proteomics.

[3]  Yang Zhang,et al.  Modeling the functional relationship network at the splice isoform level through heterogeneous data integration , 2014, bioRxiv.

[4]  T. Mackay Epistasis and quantitative traits: using model organisms to study gene–gene interactions , 2013, Nature Reviews Genetics.

[5]  Hongdong Li,et al.  Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data , 2013, PLoS Comput. Biol..

[6]  J. Harrow,et al.  Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene , 2013, Genome Biology.

[7]  Casey S. Greene,et al.  Functional Knowledge Transfer for High-accuracy Prediction of Under-studied Biological Processes , 2013, PLoS Comput. Biol..

[8]  M. Mann,et al.  Initial Quantitative Proteomic Map of 28 Mouse Tissues Using the SILAC Mouse* , 2013, Molecular & Cellular Proteomics.

[9]  Ben Lehner,et al.  Epigenetic epistatic interactions constrain the evolution of gene expression , 2013, Molecular systems biology.

[10]  Alfonso Valencia,et al.  APPRIS: annotation of principal and alternative splice isoforms , 2012, Nucleic Acids Res..

[11]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[12]  Yuanfang Guan,et al.  Tissue-Specific Functional Networks for Prioritizing Phenotype and Disease Genes , 2012, PLoS Comput. Biol..

[13]  Charles R Sanders,et al.  Tailoring of membrane proteins by alternative splicing of pre-mRNA. , 2012, Biochemistry.

[14]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[15]  Xinchen Wang,et al.  Tissue-specific alternative splicing remodels protein-protein interaction networks. , 2012, Molecular cell.

[16]  Casey S. Greene,et al.  IMP: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks , 2012, Nucleic Acids Res..

[17]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[18]  Yang Zhang,et al.  Functional implications of structural predictions for alternative splice proteins expressed in Her2/neu-induced breast cancers. , 2011, Journal of proteome research.

[19]  Jun Wan,et al.  Dynamic usage of alternative splicing exons during mouse retina development , 2011, Nucleic acids research.

[20]  B. Taneri,et al.  Distribution of Alternatively Spliced Transcript Isoforms within Human and Mouse Transcriptomes , 2011 .

[21]  Pedro A. F. Galante,et al.  Alternative splicing and genetic diversity: silencers are more frequently modified by SNVs associated with alternative exon/intron borders , 2011, Nucleic acids research.

[22]  Yuanfang Guan,et al.  Functional Genomics Complements Quantitative Genetics in Identifying Disease-Gene Associations , 2010, PLoS Comput. Biol..

[23]  Steven J. M. Jones,et al.  Alternative expression analysis by RNA sequencing , 2010, Nature Methods.

[24]  Gilbert S. Omenn,et al.  Alternative Splice Variants, a New Class of Protein Cancer Biomarker Candidates: Findings in Pancreatic Cancer and Breast Cancer with Systems Biology Implications , 2010, Disease markers.

[25]  Dorothea Emig,et al.  AltAnalyze and DomainGraph: analyzing and visualizing exon expression data , 2010, Nucleic Acids Res..

[26]  Brendan J. Frey,et al.  Deciphering the splicing code , 2010, Nature.

[27]  G. Omenn,et al.  Proteomic characterization of novel alternative splice variant proteins in human epidermal growth factor receptor 2/neu-induced breast cancers. , 2010, Cancer research.

[28]  F. Pontén,et al.  Correlations between RNA and protein expression profiles in 23 human cell lines , 2009, BMC Genomics.

[29]  Samuel H. Payne,et al.  Discovery and revision of Arabidopsis genes by proteogenomics , 2008, Proceedings of the National Academy of Sciences.

[30]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[31]  Yuanfang Guan,et al.  A Genomewide Functional Network for the Laboratory Mouse , 2008, PLoS Comput. Biol..

[32]  John R Yates,et al.  The proteome of Toxoplasma gondii: integration with the genome provides novel insights into gene expression and annotation , 2008, Genome Biology.

[33]  Hui Jiang,et al.  How is mRNA expression predictive for protein expression? A correlation study on human circulating monocytes. , 2008, Acta biochimica et biophysica Sinica.

[34]  Gil Ast,et al.  Alternative splicing and disease , 2008, RNA biology.

[35]  Heidi Zhang,et al.  Integrated pipeline for mass spectrometry-based discovery and confirmation of biomarkers demonstrated in a mouse model of breast cancer. , 2007, Journal of proteome research.

[36]  M. Mann,et al.  Analysis of the mouse liver proteome using advanced mass spectrometry. , 2007, Journal of proteome research.

[37]  R. Skotheim,et al.  Alternative splicing in cancer: noise, functional, or systematic? , 2007, The international journal of biochemistry & cell biology.

[38]  Eric W. Deutsch,et al.  The PeptideAtlas project , 2005, Nucleic Acids Res..

[39]  Akhilesh Pandey,et al.  Genome annotation of Anopheles gambiae using mass spectrometry-derived data , 2005, BMC Genomics.

[40]  F. Clark,et al.  Understanding alternative splicing: towards a cellular code , 2005, Nature Reviews Molecular Cell Biology.

[41]  B. Frey,et al.  Revealing global regulatory features of mammalian alternative splicing using a quantitative microarray platform. , 2004, Molecular cell.

[42]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[43]  D. Black Mechanisms of alternative pre-messenger RNA splicing. , 2003, Annual review of biochemistry.

[44]  F. Lewitter,et al.  Nucleotide sequence databases: a gold mine for biologists. , 1999, Trends in biochemical sciences.

[45]  Damian Fermin,et al.  Identification of novel alternative splice isoforms of circulating proteins in a mouse model of human pancreatic cancer. , 2009, Cancer research.

[46]  J. Harrow,et al.  Determination and validation of principal gene products , 2008, Bioinform..

[47]  C. Gooding,et al.  Tropomyosin exons as models for alternative splicing. , 2008, Advances in experimental medicine and biology.

[48]  John A. Calarco,et al.  Technologies for the global discovery and analysis of alternative splicing. , 2007, Advances in experimental medicine and biology.

[49]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.