The evolutionary gain of spliceosomal introns: sequence and phase preferences.

Theories regarding the evolution of spliceosomal introns differ in the extent to which the distribution of introns reflects either a formative role in the evolution of protein-coding genes or the adventitious gain of genetic elements. Here, systematic methods are used to assess the causes of the present-day distribution of introns in 10 families of eukaryotic protein-coding genes comprising 1,868 introns in 488 distinct alignment positions. The history of intron evolution inferred using a probabilistic model that allows ancestral inheritance of introns, gain of introns, and loss of introns reveals that the vast majority of introns in these eukaryotic gene families were not inherited from the most recent common ancestral genes, but were gained subsequently. Furthermore, among inferred events of intron gain that meet strict criteria of reliability, the distribution of sites of gain with respect to reading-frame phase shows a 5:3:2 ratio of phases 0, 1 and 2, respectively, and exhibits a nucleotide preference for MAG GT (positions -3 to +2 relative to the site of gain). The nucleotide preferences of intron gain may prove to be the ultimate cause for the phase bias. The phase bias of intron gain is sufficient to account quantitatively for the well-known 5:3:2 bias in phase frequencies among extant introns, a conclusion that holds even when taxonomic heterogeneity in phase patterns is considered. Thus, intron gain accounts for the vast majority of extant introns and for the bias toward phase 0 introns that previously was interpreted as evidence for ancient formative introns.

[1]  A. Fedorov,et al.  Influence of Exon Duplication on Intron and Exon Phase Distribution , 1998, Journal of Molecular Evolution.

[2]  Russell F. Doolittle,et al.  Intron Distribution in Ancient Paralogs Supports Random Insertion and Not Random Loss , 1997, Journal of Molecular Evolution.

[3]  T. Cavalier-smith,et al.  Intron phylogeny: a new hypothesis. , 1991, Trends in genetics : TIG.

[4]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[5]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[6]  D. Bhattacharya,et al.  Widespread occurrence of spliceosomal introns in the rDNA genes of ascomycetes. , 2000, Molecular biology and evolution.

[7]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[8]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[9]  M. W. Smith Structure of vertebrate genes: A statistical analysis implicating selection , 2005, Journal of Molecular Evolution.

[10]  Nicholas J. Schisler,et al.  The IDB and IEDB: intron sequence and evolution databases , 2000, Nucleic Acids Res..

[11]  D. Weeks,et al.  Nucleus-encoded, plastid-targeted acetolactate synthase genes in two closely related chlorophytes, Chlamydomonas reinhardtii and Volvox carteri: phylogenetic origins and recent insertion of introns , 1999, Molecular and General Genetics MGG.

[12]  Stephen M. Mount,et al.  A catalogue of splice junction sequences. , 1982, Nucleic acids research.

[13]  Christophe G. Lambert,et al.  Comparative analysis of seven multiple protein sequence alignment servers: clues to enhance reliability of predictions , 1998, Bioinform..

[14]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[15]  Ari Löytynoja,et al.  SOAP, cleaning multiple alignments from unstable blocks , 2001, Bioinform..

[16]  S J de Souza,et al.  Toward a resolution of the introns early/late debate: only phase zero introns are correlated with the structure of ancient proteins. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Tim Bunce,et al.  Programming the Perl DBI , 2000 .

[18]  Alligator Descartes,et al.  Programming the Perl DBI - database programming with Perl , 2000 .

[19]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[20]  Arlin Stoltzfus,et al.  Molecular evolution: Recent cases of spliceosomal intron gain? , 1998, Current Biology.

[21]  M. Zuker,et al.  Testing the exon theory of genes: the evidence from protein structure. , 1994, Science.

[22]  S J de Souza,et al.  Intron positions correlate with module boundaries in ancient proteins. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[23]  David J. Spiegelhalter,et al.  Bayesian analysis in expert systems , 1993 .

[24]  Alexei Fedorov,et al.  Large-scale comparison of intron positions among animal, plant, and fungal genes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[25]  S J de Souza,et al.  Relationship between "proto-splice sites" and intron phases: evidence from dicodon analysis. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[26]  D. -. Zhang,et al.  Differential intron loss and endosymbiotic transfer of chloroplast glyceraldehyde-3-phosphate dehydrogenase genes to the nucleus. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Counting and discounting the universe of exons. , 1991, Science.

[28]  W. Rutter,et al.  Splice junctions: association with variation in protein structure. , 1983, Science.

[29]  W. Gilbert,et al.  Intron phase correlations and the evolution of the intron/exon structure of genes. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[30]  W. Gilbert,et al.  On the ancient nature of introns. , 1993, Gene.

[31]  N. Dibb,et al.  Proto-splice site model of intron origin. , 1991, Journal of theoretical biology.

[32]  W. Gilbert,et al.  Introns and gene evolution , 1996, Genes to cells : devoted to molecular & cellular mechanisms.

[33]  Walter Gilbert,et al.  The limited universe of exons , 1991, Current Biology.

[34]  D B Rubin,et al.  Markov chain Monte Carlo methods in biostatistics , 1996, Statistical methods in medical research.

[35]  E. Koonin,et al.  Intron sliding in conserved gene families. , 2000, Trends in genetics : TIG.

[36]  T. D. Schneider,et al.  Features of spliceosome evolution and function inferred from an analysis of the information at human splice sites. , 1992, Journal of molecular biology.

[37]  John M. Logsdon,et al.  The recent origins of introns. , 1991 .

[38]  J D Palmer,et al.  Intron "sliding" and the diversity of intron positions. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[39]  P. Keeling,et al.  Re-examining Alveolate Evolution Using Multiple Protein Molecular Phylogenies , 2002, The Journal of eukaryotic microbiology.

[40]  S J de Souza,et al.  Evolution of the intron-exon structure of eukaryotic genes. , 1995, Current opinion in genetics & development.

[41]  A. Roger Studies on the phylogeny and gene structure of early-branching eukaryotes. , 1997 .

[42]  M. Long,et al.  Testing the "proto-splice sites" model of intron origin: evidence from analysis of intron phase correlations. , 2000, Molecular biology and evolution.

[43]  Andrew J Newmann Pre-mRNA splicing. , 1994 .

[44]  J E Darnell,et al.  Speculations on the early course of evolution. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[45]  A Yoshida,et al.  Exon/intron structure of aldehyde dehydrogenase genes supports the "introns-late" theory. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[46]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[47]  T. Eickbush Molecular biology: Introns gain ground , 2000, Nature.

[48]  M. Long,et al.  Intron-exon structures of eukaryotic model organisms. , 1999, Nucleic acids research.

[49]  G. Fichant Constraints acting on the exon positions of the splice site sequences and local amino acid composition of the protein. , 1992, Human molecular genetics.

[50]  M. Pagel,et al.  The comparative method in evolutionary biology , 1991 .

[51]  B. Huang,et al.  Genomic structure of Chlamydomonas caltractin. Evidence for intron insertion suggests a probable genealogy for the EF-hand superfamily of proteins. , 1991, Journal of molecular biology.

[52]  R. Mache,et al.  Characterization of a ubiquitous expressed gene family encoding polygalacturonase in Arabidopsis thaliana. , 2000, Gene.

[53]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[54]  G. M. Suboch,et al.  Analysis of nonuniformity in intron phase distribution. , 1992, Nucleic acids research.

[55]  A. Newman,et al.  Evidence that introns arose at proto‐splice sites. , 1989, The EMBO journal.

[56]  C. R. McClung,et al.  Intron loss and gain during evolution of the catalase gene family in angiosperms. , 1998, Genetics.

[57]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[58]  László Patthy,et al.  Modular exchange principles in proteins , 1991 .

[59]  Walter Gilbert,et al.  On the antiquity of introns , 1986, Cell.

[60]  D. Hickey,et al.  A general model for the evolution of nuclear pre-mRNA introns. , 1989, Journal of theoretical biology.