Alternative splicing and protein function

BackgroundAlternative splicing is a major mechanism of generating protein diversity in higher eukaryotes. Although at least half, and probably more, of mammalian genes are alternatively spliced, it was not clear, whether the frequency of alternative splicing is the same in different functional categories. The problem is obscured by uneven coverage of genes by ESTs and a large number of artifacts in the EST data.ResultsWe have developed a method that generates possible mRNA isoforms for human genes contained in the EDAS database, taking into account the effects of nonsense-mediated decay and translation initiation rules, and a procedure for offsetting the effects of uneven EST coverage. Then we computed the number of mRNA isoforms for genes from different functional categories. Genes encoding ribosomal proteins and genes in the category "Small GTPase-mediated signal transduction" tend to have fewer isoforms than the average, whereas the genes in the category "DNA replication and chromosome cycle" have more isoforms than the average. Genes encoding proteins involved in protein-protein interactions tend to be alternatively spliced more often than genes encoding non-interacting proteins, although there is no significant difference in the number of isoforms of alternatively spliced genes.ConclusionFiltering for functional isoforms satisfying biological constraints and accountung for uneven EST coverage allowed us to describe differences in alternative splicing of genes from different functional categories. The observations seem to be consistent with expectations based on current biological knowledge: less isoforms for ribosomal and signal transduction proteins, and more alternative splicing of interacting and cell cycle proteins.

[1]  Terry Gaasterland,et al.  Splice variation in mouse full-length cDNAs identified by mapping to the mouse genome. , 2002, Genome research.

[2]  C. Southan Has the yo‐yo stopped? An assessment of human protein‐coding gene number , 2004, Proteomics.

[3]  P. Green,et al.  Analysis of expressed sequence tags indicates 35,000 human genes , 2000, Nature Genetics.

[4]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[5]  Kanako O. Koyanagi,et al.  Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones , 2004, PLoS Biology.

[6]  Yi Xing,et al.  Assessing the impact of alternative splicing on domain interactions in the human proteome. , 2004, Journal of proteome research.

[7]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[8]  PagelPhilipp,et al.  The MIPS mammalian protein--protein interaction database , 2005 .

[9]  Mikhail S. Gelfand,et al.  Pro-Frame: similarity-based gene recognition in eukaryotic DNA sequences with errors , 2001, Bioinform..

[10]  Yi Xing,et al.  The multiassembly problem: reconstructing multiple transcript isoforms from EST fragment mixtures. , 2004, Genome research.

[11]  P. Bork,et al.  Alternative splicing and genome complexity , 2002, Nature Genetics.

[12]  J. Ott,et al.  Estimating rates of alternative splicing in mammals and invertebrates , 2004, Nature Genetics.

[13]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[14]  M. Kozak,et al.  Pushing the limits of the scanning mechanism for initiation of translation , 2002, Gene.

[15]  Christopher J. Lee Generating Consensus Sequences from Partial Order Multiple Sequence Alignment Graphs , 2003, Bioinform..

[16]  Christopher J. Lee,et al.  Genome-wide detection of alternative splicing in expressed sequences of human genes , 2001, Nucleic Acids Res..

[17]  David States,et al.  Selecting for functional alternative splices in ESTs. , 2002, Genome research.

[18]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[19]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[20]  International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome , 2004 .

[21]  Haixu Tang,et al.  Splicing graphs and EST assembly problem , 2002, ISMB.

[22]  J. Castle,et al.  Genome-Wide Survey of Human Alternative Pre-mRNA Splicing with Exon Junction Microarrays , 2003, Science.

[23]  Marc N. Offman,et al.  No statistical support for correlation between the positions of protein interaction sites and alternatively spliced regions , 2004, BMC Bioinformatics.

[24]  M. Gelfand,et al.  Frequent alternative splicing of human genes. , 1999, Genome research.

[25]  Tim Hubbard Finishing the euchromatic sequence of the human genome , 2004 .

[26]  M. Wormington,et al.  Zero tolerance for nonsense: nonsense-mediated mRNA decay uses multiple degradation pathways. , 2003, Molecular cell.

[27]  L. Maquat,et al.  Nonsense-mediated mRNA decay in mammalian cells involves decapping, deadenylating, and exonucleolytic activities. , 2003, Molecular cell.

[28]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[29]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[30]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[31]  W. Gish,et al.  Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. , 2001, Genome research.

[32]  P Bork,et al.  EST comparison indicates 38% of human mRNAs contain possible alternative splice forms , 2000, FEBS letters.

[33]  V. Agol,et al.  Molecular mechanisms of translation initiation in eukaryotes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[34]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[35]  John Quackenbush,et al.  Gene Index analysis of the human genome estimates approximately 120,000 genes , 2000, Nature Genetics.