A proteogenomic approach to understand splice isoform functions through sequence and expression-based computational modeling

The products of multi-exon genes are a mixture of alternatively spliced isoforms, from which the translated proteins can have similar, different or even opposing functions. It is therefore essential to differentiate and annotate functions for individual isoforms. Computational approaches provide an efficient complement to expensive and time-consuming experimental studies. The input data of these methods range from DNA sequence, to RNA selection pressure, to expressed sequence tags, to full-length complementary DNA, to exon array, to RNA-seq expression, to proteomic data. Notably, RNA-seq technology generates quantitative profiling of transcript expression at the genome scale, with an unprecedented amount of expression data available for developing isoform function prediction methods. Integrative analysis of these data at different molecular levels enables a proteogenomic approach to systematically interrogate isoform functions. Here, we briefly review the state-of-the-art methods according to their input data sources, discuss their advantages and limitations and point out potential ways to improve prediction accuracies.

[1]  Tony Pawson,et al.  Splice-Mediated Motif Switching Regulates Disabled-1 Phosphorylation and SH2 Domain Interactions , 2012, Molecular and Cellular Biology.

[2]  Florian Sohler,et al.  Exon Array Analysis using re-defined probe sets results in reliable identification of alternatively spliced genes in non-small cell lung cancer , 2010, BMC Genomics.

[3]  E. Sprinzak,et al.  Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes. , 1999, Genome research.

[4]  Yang Zhang,et al.  Innovations in proteomic profiling of cancers: alternative splice variants as a new class of cancer biomarker candidates and bridging of proteomics with structural biology. , 2013, Journal of proteomics.

[5]  D. Black Mechanisms of alternative pre-messenger RNA splicing. , 2003, Annual review of biochemistry.

[6]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.

[7]  Yang Zhang,et al.  Functional implications of structural predictions for alternative splice proteins expressed in Her2/neu-induced breast cancers. , 2011, Journal of proteome research.

[8]  Hsinchun Chen,et al.  Gene Function Prediction With Gene Interaction Networks: A Context Graph Kernel Approach , 2022, IEEE Transactions on Information Technology in Biomedicine.

[9]  Lan Lin,et al.  Predicting Functional Alternative Splicing by Measuring RNA Selection Pressure from Multigenome Alignments , 2009, PLoS Comput. Biol..

[10]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[11]  Di Wu,et al.  Bioinformatics analysis of the epitope regions for norovirus capsid protein , 2013, BMC Bioinformatics.

[12]  O. Monni,et al.  Comprehensive exon array data processing method for quantitative analysis of alternative spliced variants , 2011, Nucleic acids research.

[13]  Yang Zhang,et al.  Functional Networks of Highest-Connected Splice Isoforms: From The Chromosome 17 Human Proteome Project. , 2015, Journal of proteome research.

[14]  C. Ponting,et al.  The natural history of protein domains. , 2002, Annual review of biophysics and biomolecular structure.

[15]  Takeshi Itoh,et al.  Alternative splicing in human transcriptome: functional and structural influence on proteins. , 2006, Gene.

[16]  Yuanfang Guan,et al.  A new class of protein cancer biomarker candidates: differentially expressed splice variants of ERBB2 (HER2/neu) and ERBB1 (EGFR) in breast cancer cell lines. , 2014, Journal of proteomics.

[17]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[18]  Chao Zhang,et al.  An integrated probabilistic approach for gene function prediction using multiple sources of high-throughput data , 2008, Int. J. Comput. Biol. Drug Des..

[19]  O. Troyanskaya,et al.  Predicting gene function in a hierarchical context with an ensemble of classifiers , 2008, Genome Biology.

[20]  Alexander I Archakov,et al.  Tissue-specific alternative splicing analysis reveals the diversity of chromosome 18 transcriptome. , 2014, Journal of proteome research.

[21]  Casey S. Greene,et al.  IMP: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks , 2012, Nucleic Acids Res..

[22]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[23]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[24]  Richard Bonneau,et al.  Functional inferences from blind ab initio protein structure predictions. , 2001, Journal of structural biology.

[25]  Yuanfang Guan,et al.  Functional Genomics Complements Quantitative Genetics in Identifying Disease-Gene Associations , 2010, PLoS Comput. Biol..

[26]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[27]  Sonia H. Shah,et al.  Identifying differential exon splicing using linear models and correlation coefficients , 2009, BMC Bioinformatics.

[28]  William S Hancock,et al.  GenomewidePDB 2.0: A Newly Upgraded Versatile Proteogenomic Database for the Chromosome-Centric Human Proteome Project. , 2015, Journal of proteome research.

[29]  Stefanie Mannebach,et al.  Alternative Splicing of a Protein Domain Indispensable for Function of Transient Receptor Potential Melastatin 3 (TRPM3) Ion Channels* , 2012, The Journal of Biological Chemistry.

[30]  Sanghamitra Bandyopadhyay,et al.  A Weighted Power Framework for Integrating Multisource Information: Gene Function Prediction in Yeast , 2012, IEEE Transactions on Biomedical Engineering.

[31]  Amos Bairoch,et al.  Metrics for the Human Proteome Project 2013-2014 and strategies for finding missing proteins. , 2014, Journal of proteome research.

[32]  Olga G. Troyanskaya,et al.  A scalable method for integration and functional analysis of multiple microarray datasets , 2006, Bioinform..

[33]  Fan Zhu,et al.  Predicting dynamic signaling network response under unseen perturbations , 2014, Bioinform..

[34]  Erik L. L. Sonnhammer,et al.  Predicting protein function from domain content , 2008, Bioinform..

[35]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Yi Xing,et al.  ASAP: the Alternative Splicing Annotation Project , 2003, Nucleic Acids Res..

[37]  Martin Vingron,et al.  Increase of functional diversity by alternative splicing. , 2003, Trends in genetics : TIG.

[38]  Y. Guan,et al.  Revisiting the identification of canonical splice isoforms through integration of functional genomics and proteomics evidence , 2014, Proteomics.

[39]  G. Ast,et al.  Alternative splicing and evolution: diversification, exon definition and function , 2010, Nature Reviews Genetics.

[40]  S. Brenner,et al.  The evolving roles of alternative splicing. , 2004, Current opinion in structural biology.

[41]  Hongdong Li,et al.  MIsoMine: a genome-scale high-resolution data portal of expression, function and networks at the splice isoform level in the mouse , 2015, Database J. Biol. Databases Curation.

[42]  Hui Jiang,et al.  MADS: a new and improved method for analysis of differential alternative splicing by exon-tiling microarrays. , 2008, RNA.

[43]  Yang Zhang,et al.  COFACTOR: an accurate comparative algorithm for structure-based protein function annotation , 2012, Nucleic Acids Res..

[44]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[45]  G. von Heijne,et al.  Tissue-based map of the human proteome , 2015, Science.

[46]  Hongdong Li,et al.  Modeling dynamic functional relationship networks and application to ex vivo human erythroid differentiation , 2014, Bioinform..

[47]  P. Green,et al.  Sequence conservation, relative isoform frequencies, and nonsense-mediated decay in evolutionarily conserved alternative splicing. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[48]  M. Gelfand,et al.  Low conservation of alternative splicing patterns in the human and mouse genomes. , 2003, Human molecular genetics.

[49]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[50]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[51]  Y. Guan,et al.  The emerging era of genomic data integration for analyzing splice isoform function. , 2014, Trends in genetics : TIG.

[52]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[53]  Christine A. Orengo,et al.  Protein function prediction using domain families , 2013, BMC Bioinformatics.

[54]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[55]  Hongdong Li,et al.  Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data , 2013, PLoS Comput. Biol..

[56]  Ramil N. Nurtdinov,et al.  Alternative splicing and protein function , 2005, BMC Bioinformatics.

[57]  S. Stamm,et al.  Function of alternative splicing. , 2013, Gene.

[58]  Trees-Juen Chuang,et al.  Opposite evolutionary effects between different alternative splicing patterns. , 2007, Molecular biology and evolution.

[59]  Dorothea Emig,et al.  AltAnalyze and DomainGraph: analyzing and visualizing exon expression data , 2010, Nucleic Acids Res..

[60]  Paola Bonizzoni,et al.  ASPIC: a web resource for alternative splicing prediction and transcript isoforms characterization , 2006, Nucleic Acids Res..

[61]  K. Ginalski Comparative modeling for protein structure prediction. , 2006, Current opinion in structural biology.

[62]  Yan Liu,et al.  High-resolution functional annotation of human transcriptome: predicting isoform functions by a novel multiple instance-based label propagation method , 2013, Nucleic acids research.

[63]  Samuel H. Payne,et al.  Proteogenomic strategies for identification of aberrant cancer peptides using large‐scale next‐generation sequencing data , 2014, Proteomics.

[64]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[65]  Peer Bork,et al.  Alternative splicing and evolution. , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[66]  S. Horvath,et al.  Protein interaction network of alternatively spliced isoforms from brain links genetic risk factors for autism , 2014, Nature Communications.

[67]  Michael I. Jordan,et al.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence , 2008, Genome Biology.

[68]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[69]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[70]  Kozo Kawahara,et al.  Comprehensive Analysis of Alternative Splicing and Functionality in Neuronal Differentiation of P19 Cells , 2011, PloS one.

[71]  Yuanfang Guan,et al.  A Genomewide Functional Network for the Laboratory Mouse , 2008, PLoS Comput. Biol..

[72]  Xinchen Wang,et al.  Tissue-specific alternative splicing remodels protein-protein interaction networks. , 2012, Molecular cell.

[73]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[74]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[75]  Yuanfang Guan,et al.  Tissue-Specific Functional Networks for Prioritizing Phenotype and Disease Genes , 2012, PLoS Comput. Biol..

[76]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..