Comparative Analysis of Serine/Arginine-Rich Proteins across 27 Eukaryotes: Insights into Sub-Family Classification and Extent of Alternative Splicing

Alternative splicing (AS) of pre-mRNA is a fundamental molecular process that generates diversity in the transcriptome and proteome of eukaryotic organisms. SR proteins, a family of splicing regulators with one or two RNA recognition motifs (RRMs) at the N-terminus and an arg/ser-rich domain at the C-terminus, function in both constitutive and alternative splicing. We identified SR proteins in 27 eukaryotic species, which include plants, animals, fungi and “basal” eukaryotes that lie outside of these lineages. Using RNA recognition motifs (RRMs) as a phylogenetic marker, we classified 272 SR genes into robust sub-families. The SR gene family can be split into five major groupings, which can be further separated into 11 distinct sub-families. Most flowering plants have double or nearly double the number of SR genes found in vertebrates. The majority of plant SR genes are under purifying selection. Moreover, in all paralogous SR genes in Arabidopsis, rice, soybean and maize, one of the two paralogs is preferentially expressed throughout plant development. We also assessed the extent of AS in SR genes based on a splice graph approach (http://combi.cs.colostate.edu/as/gmap_SRgenes). AS of SR genes is a widespread phenomenon throughout multiple lineages, with alternative 3′ or 5′ splicing events being the most prominent type of event. However, plant-enriched sub-families have 57%–88% of their SR genes experiencing some type of AS compared to the 40%–54% seen in other sub-families. The SR gene family is pervasive throughout multiple eukaryotic lineages, conserved in sequence and domain organization, but differs in gene number across lineages with an abundance of SR genes in flowering plants. The higher number of alternatively spliced SR genes in plants emphasizes the importance of AS in generating splice variants in these organisms.

[1]  J. Cáceres,et al.  Functional characterization of SR and SR‐related genes in Caenorhabditis elegans , 2000, The EMBO journal.

[2]  A. Krainer,et al.  Nuclear Export and Retention Signals in the RS Domain of SR Proteins , 2002, Molecular and Cellular Biology.

[3]  B D Hall,et al.  The origin of red algae: implications for plastid evolution. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[4]  A. Krainer,et al.  A specific subset of SR proteins shuttles continuously between the nucleus and the cytoplasm. , 1998, Genes & development.

[5]  Srinivas Aluru,et al.  Large-scale maximum likelihood-based phylogenetic analysis on the IBM BlueGene/L , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[6]  J. Bell,et al.  SR protein kinases: the splice of life. , 1999, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[7]  Guillaume Blanc,et al.  Functional Divergence of Duplicated Genes Formed by Polyploidy during Arabidopsis Evolution , 2004, The Plant Cell Online.

[8]  T. Kuroiwa,et al.  A 100%-complete sequence reveals unusually simple genomic features in the hot-spring red alga Cyanidioschyzon merolae , 2007, BMC Biology.

[9]  Tom Maniatis,et al.  Specific interactions between proteins implicated in splice site selection and regulated alternative splicing , 1993, Cell.

[10]  Kei Iida,et al.  Survey of conserved alternative splicing events of mRNAs encoding SR proteins in land plants. , 2006, Molecular biology and evolution.

[11]  A. Krainer,et al.  Purification and characterization of pre-mRNA splicing factor SF2 from HeLa cells. , 1990, Genes & development.

[12]  Lior Pachter,et al.  Fast Statistical Alignment , 2009, PLoS Comput. Biol..

[13]  Mark F. Rogers,et al.  Genome-wide analysis of alternative splicing in Chlamydomonas reinhardtii , 2010, BMC Genomics.

[14]  A. Barta,et al.  Plant SR proteins and their functions. , 2008, Current topics in microbiology and immunology.

[15]  B. Graveley,et al.  Arginine/serine repeats are sufficient to constitute a splicing activation domain. , 2003, Nucleic acids research.

[16]  Stephen M. Mount,et al.  Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis , 2006, BMC Genomics.

[17]  G. Screaton,et al.  Role of SR protein modular domains in alternative splicing specificity in vivo. , 2000, Nucleic acids research.

[18]  M B Roth,et al.  SR proteins: a conserved family of pre-mRNA splicing factors. , 1992, Genes & development.

[19]  Peer Bork,et al.  Sircah: a tool for the detection and visualization of alternative transcripts , 2008, Bioinform..

[20]  Andrew M. Jenkinson,et al.  Ensembl 2009 , 2008, Nucleic Acids Res..

[21]  K. Lynch,et al.  Regulation of Alternative Splicing: More than Just the ABCs* , 2008, Journal of Biological Chemistry.

[22]  Qunfeng Dong,et al.  MaizeGDB, the community database for maize genetics and genomics , 2004, Nucleic Acids Res..

[23]  B. Barrell,et al.  The genome sequence of Schizosaccharomyces pombe , 2002, Nature.

[24]  A. Krainer,et al.  Crystal structure of the two-RRM domain of hnRNP A1 (UP1) complexed with single-stranded telomeric DNA. , 1999, Genes & development.

[25]  J. Poulain,et al.  The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla , 2007, Nature.

[26]  S. Costanzo,et al.  Alternate intron processing of family 5 endoglucanase transcripts from the genus Phytophthora , 2007, Current Genetics.

[27]  A. Reddy,et al.  Alternative splicing of pre-mRNAs of Arabidopsis serine/arginine-rich proteins: regulation by hormones and stresses. , 2007, The Plant journal : for cell and molecular biology.

[28]  B. Graveley Alternative splicing: increasing diversity in the proteomic world. , 2001, Trends in genetics : TIG.

[29]  David L. Steffen,et al.  The genome of the social amoeba Dictyostelium discoideum , 2005, Nature.

[30]  N. Mulder,et al.  InterPro and InterProScan: tools for protein sequence classification and comparison. , 2007, Methods in molecular biology.

[31]  John A. Hamilton,et al.  The TIGR Rice Genome Annotation Resource: improvements and new features , 2006, Nucleic Acids Res..

[32]  T. Sakurai,et al.  Genome sequence of the palaeopolyploid soybean , 2010, Nature.

[33]  A. Krainer,et al.  A rational nomenclature for serine/arginine-rich protein splicing factors (SR proteins). , 2010, Genes & development.

[34]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[35]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[36]  J. Manley,et al.  Mechanisms of alternative splicing regulation: insights from molecular and genomics approaches , 2009, Nature Reviews Molecular Cell Biology.

[37]  B. Graveley Sorting out the complexity of SR protein functions. , 2000, RNA.

[38]  John W. S. Brown,et al.  Regulation of plant gene expression by alternative splicing. , 2010, Biochemical Society transactions.

[39]  Haibao Tang,et al.  Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. , 2008, Genome research.

[40]  Xiang-Dong Fu,et al.  The splicing factor SC35 has an active role in transcriptional elongation , 2008, Nature Structural &Molecular Biology.

[41]  Gil Ast,et al.  How did alternative splicing evolve? , 2004, Nature Reviews Genetics.

[42]  Tomas Hruz,et al.  Genevestigator transcriptome meta-analysis and biomarker search using rice and barley gene expression databases. , 2008, Molecular plant.

[43]  Henning Urlaub,et al.  Composition and three‐dimensional EM structure of double affinity‐purified, human prespliceosomal A complexes , 2007, The EMBO journal.

[44]  M B Roth,et al.  A conserved family of nuclear phosphoproteins localized to sites of polymerase II transcription , 1991, The Journal of cell biology.

[45]  J. Cáceres,et al.  The SR protein family of splicing factors: master regulators of gene expression. , 2009, The Biochemical journal.

[46]  Haruki Nakamura,et al.  PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences , 2010, Nucleic Acids Res..

[47]  A. Kornblihtt,et al.  An early ancestor in the evolution of splicing: a Trypanosoma cruzi serine-arginine-rich protein (TcSR) is functional in cis-splicing. , 2003, Molecular and biochemical parasitology.

[48]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[49]  Rebecca L Poole The TAIR database. , 2007, Methods in molecular biology.

[50]  Lilia M. Iakoucheva,et al.  Serine/arginine-rich splicing factors belong to a class of intrinsically disordered proteins , 2006, Nucleic acids research.

[51]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[52]  David Posada,et al.  ProtTest: selection of best-fit models of protein evolution , 2005, Bioinform..

[53]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[54]  A. Barta,et al.  Evolutionary conservation and regulation of particular alternative splicing events in plant SR proteins , 2006, Nucleic acids research.

[55]  Matthew Berriman,et al.  GeneDB: a resource for prokaryotic and eukaryotic organisms , 2004, Nucleic Acids Res..

[56]  S. Brenner,et al.  Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements , 2007, Nature.

[57]  Becky Verastegui,et al.  Proceedings of the 2007 ACM/IEEE conference on Supercomputing , 2007, HiPC 2007.

[58]  R. Roberts,et al.  An amazing sequence arrangement at the 5′ ends of adenovirus 2 messenger RNA , 1977, Cell.

[59]  Ralf Reski,et al.  An ancient genome duplication contributed to the abundance of metabolic genes in the moss Physcomitrella patens , 2007, BMC Evolutionary Biology.

[60]  Michael Kaufmann,et al.  DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment , 2008, Algorithms for Molecular Biology.

[61]  Eileen Kraemer,et al.  PlasmoDB: a functional genomic database for malaria parasites , 2008, Nucleic Acids Res..

[62]  A. Barta,et al.  Implementing a Rational and Consistent Nomenclature for Serine/Arginine-Rich Protein Splicing Factors (SR Proteins) in Plants , 2010, Plant Cell.

[63]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[64]  Henry D. Priest,et al.  Genome-wide mapping of alternative splicing in Arabidopsis thaliana. , 2010, Genome research.

[65]  A. Simpson,et al.  Evolution: Revisiting the Root of the Eukaryote Tree , 2009, Current Biology.

[66]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[67]  Haruki Nakamura,et al.  Protein function annotation from sequence: prediction of residues interacting with RNA , 2009, Bioinform..

[68]  L Alexander Lyznik,et al.  ASF/SF2-like maize pre-mRNA splicing factors affect splice site utilization and their transcripts are alternatively spliced. , 2004, Gene.

[69]  Andrea Barta,et al.  Genome analysis: RNA recognition motif (RRM) and K homology (KH) domain RNA-binding proteins from the flowering plant Arabidopsis thaliana. , 2002, Nucleic acids research.

[70]  Christopher D Town,et al.  Analysis of the cDNAs of Hypothetical Genes on Arabidopsis Chromosome 2 Reveals Numerous Transcript Variants1[w] , 2005, Plant Physiology.

[71]  David Osumi-Sutherland,et al.  FlyBase: enhancing Drosophila Gene Ontology annotations , 2008, Nucleic Acids Res..

[72]  Jennifer Daub,et al.  Expressed sequence tags: medium-throughput protocols. , 2004, Methods in molecular biology.

[73]  J. Manley,et al.  Sequence-specific RNA binding by an SR protein requires RS domain phosphorylation: creation of an SRp40-specific splicing enhancer. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[74]  Stephen M. Mount,et al.  Pre-Messenger RNA Processing Factors in the Drosophila Genome , 2000, The Journal of cell biology.

[75]  Peter J. Shepard,et al.  The SR protein family , 2009, Genome Biology.

[76]  Peer Bork,et al.  SMART 6: recent updates and new developments , 2008, Nucleic Acids Res..

[77]  Rolf Apweiler,et al.  InterPro and InterProScan , 2007 .

[78]  A. Reddy Alternative splicing of pre-messenger RNAs in plants in the genomic era. , 2007, Annual review of plant biology.

[79]  Michael Kaufmann,et al.  BMC Bioinformatics BioMed Central , 2005 .

[80]  G. Ast,et al.  Different levels of alternative splicing among eukaryotes , 2006, Nucleic acids research.

[81]  W. Gilbert Why genes in pieces? , 1978, Nature.

[82]  Ronald W. Davis,et al.  Role of duplicate genes in genetic robustness against null mutations , 2003, Nature.

[83]  A. Barta,et al.  A plethora of plant serine/arginine-rich proteins: redundancy or evolution of novel gene functions? , 2004, Biochemical Society transactions.

[84]  Kimberly Van Auken,et al.  WormBase: a comprehensive resource for nematode research , 2009, Nucleic Acids Res..

[85]  R. Roberts,et al.  An amazing sequence arrangement at the 5' ends of adenovirus 2 messenger RNA. 1977. , 2000, Reviews in medical virology.

[86]  R. Lin,et al.  Interactions between two fission yeast serine/arginine-rich proteins and their modulation by phosphorylation. , 2002, The Biochemical journal.

[87]  Henning Urlaub,et al.  Protein Composition and Electron Microscopy Structure of Affinity-Purified Human Spliceosomal B Complexes Isolated under Physiological Conditions , 2006, Molecular and Cellular Biology.

[88]  J. Manley,et al.  A protein factor, ASF, controls cell-specific alternative splicing of SV40 early pre-mRNA in vitro , 1990, Cell.

[89]  Julie D Thompson,et al.  Multiple Sequence Alignment Using ClustalW and ClustalX , 2003, Current protocols in bioinformatics.

[90]  K. Shimamoto,et al.  The Serine/Arginine-Rich Protein Family in Rice Plays Important Roles in Constitutive and Alternative Splicing of Pre-mRNA[W] , 2005, The Plant Cell Online.

[91]  V. Brendel,et al.  Genomewide comparative analysis of alternative splicing in plants. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[92]  Brenton R Graveley,et al.  RS domains contact the pre-mRNA throughout spliceosome assembly. , 2005, Trends in biochemical sciences.

[93]  A. Reddy,et al.  Rapid report Extensive coupling of alternative splicing of pre-mRNAs of serine⁄arginine (SR) genes with nonsense-mediated decay , 2009 .

[94]  Anireddy S N Reddy,et al.  Plant serine/arginine-rich proteins and their role in pre-mRNA splicing. , 2004, Trends in plant science.

[95]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[96]  Michael R. Green,et al.  Arginine-serine-rich domains bound at splicing enhancers contact the branchpoint to promote prespliceosome assembly. , 2004, Molecular cell.

[97]  T. Nilsen,et al.  Expansion of the eukaryotic proteome by alternative splicing , 2010, Nature.

[98]  P. Bork,et al.  Alternative splicing and genome complexity , 2002, Nature Genetics.