Characterization and prediction of mRNA polyadenylation sites in human genes

The accurate identification of potential poly(A) sites has contributed to all many studies with regard to alternative polyadenylation. The aim of this study was the development of a machine-learning methodology that will help to discriminate real polyadenylation signals from randomly occurring signals in genomic sequence. Since previous studies have revealed that RNA secondary structure in certain genes has significant impact, the authors tried to computationally pinpoint common structural patterns around the poly(A) sites and to investigate how RNA secondary structure may influence polyadenylation. This involved an initial study on the impact of RNA structure and it was found using motif search tools that hairpin structures might be important. Thus, it was propose that, in addition to the sequence pattern around poly(A) sites, there exists a widespread structural pattern that is also employed during human mRNA polyadenylation. In this study, the authors present a computational model that uses support vector machines to predict human poly(A) sites. The results show that this predictive model has a comparable performance to the current prediction tool. In addition, it was identified common structural patterns associated with polyadenylation using several motif finding programs and this provides new insight into the role of RNA secondary structure plays in polyadenylation.

[1]  Hans D. Ochs,et al.  A rare polyadenylation signal mutation of the FOXP3 gene (AAUAAA→AAUGAA) leads to the IPEX syndrome , 2001, Immunogenetics.

[2]  Donglin Liu,et al.  BIOINFORMATICS APPLICATIONS NOTE Databases and ontologies PACdb: PolyA Cleavage Site and 3 ′-UTR Database , 2022 .

[3]  Bin Tian,et al.  A large-scale analysis of mRNA polyadenylation of human and mouse genes , 2005, Nucleic acids research.

[4]  D. Hovorun,et al.  Downstream elements of mammalian pre-mRNA polyadenylation signals: primary, secondary and higher-order structures. , 2003, Nucleic acids research.

[5]  Jeffrey Wilusz,et al.  Downstream sequence elements with different affinities for the hnRNP H/H' protein influence the processing efficiency of mammalian polyadenylation signals. , 2002, Nucleic acids research.

[6]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[7]  J. Alwine,et al.  The human immunodeficiency virus type 1 polyadenylylation signal: a 3' long terminal repeat element upstream of the AAUAAA necessary for efficient polyadenylylation. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[8]  D. Gautheret,et al.  Sequence determinants in human polyadenylation site selection , 2003, BMC Genomics.

[9]  T. Marr,et al.  Computational analysis of 3'-ends of ESTs shows four classes of alternative polyadenylation in human, mouse, and rat. , 2005, Genome research.

[10]  Robert M. Miura,et al.  Prediction of mRNA polyadenylation sites by support vector machine , 2006, Bioinform..

[11]  Ye Ding,et al.  Sfold web server for statistical folding and rational design of nucleic acids , 2004, Nucleic Acids Res..

[12]  K. Heller,et al.  Sequence information for the splicing of human pre-mRNA identified by support vector machine classification. , 2003, Genome research.

[13]  Huiqing Liu,et al.  An in-silico method for prediction of polyadenylation signals in human sequences. , 2003, Genome informatics. International Conference on Genome Informatics.

[14]  Bin Tian,et al.  PolyA_DB 2: mRNA polyadenylation sites in vertebrate genes , 2007, Nucleic Acids Res..

[15]  Graziano Pesole,et al.  UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs , 2004, Nucleic Acids Res..

[16]  Bin Tian,et al.  Alternative polyadenylation of cyclooxygenase-2 , 2005, Nucleic acids research.

[17]  Jeffrey Wilusz,et al.  Upstream Elements Present in the 3′-Untranslated Region of Collagen Genes Influence the Processing Efficiency of Overlapping Polyadenylation Signals* , 2002, The Journal of Biological Chemistry.

[18]  C R Cantor,et al.  In silico detection of control signals: mRNA 3'-end-processing sequences in diverse species. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[19]  M. Wollerton,et al.  Upstream sequence elements enhance poly(A) site efficiency of the C2 complement gene and are phylogenetically conserved. , 1995, The EMBO journal.

[20]  D. Gautheret,et al.  Patterns of variant polyadenylation signal usage in human genes. , 2000, Genome research.

[21]  J. Alwine,et al.  Efficiency of utilization of the simian virus 40 late polyadenylation site: effects of upstream sequences , 1989, Molecular and cellular biology.

[22]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[23]  B. Cullen,et al.  Efficient polyadenylation within the human immunodeficiency virus type 1 long terminal repeat requires flanking U3-specific sequences , 1991, Journal of virology.

[24]  E. Wahle,et al.  3'-end cleavage and polyadenylation of mRNA precursors. , 1995, Biochimica et biophysica acta.

[25]  C. MacDonald,et al.  Reexamining the polyadenylation signal: were we wrong about AAUAAA? , 2002, Molecular and Cellular Endocrinology.

[26]  Michael Q. Zhang Discriminant Analysis and Its Application in DNA Sequence Motif Recognition , 2000, Briefings Bioinform..

[27]  Gene W. Yeo,et al.  Variation in alternative splicing across human tissues , 2004, Genome Biology.

[28]  Jack E. Tabaska,et al.  Detection of polyadenylation signals in human DNA sequences. , 1999, Gene.

[29]  J. Manley,et al.  Mechanism and regulation of mRNA polyadenylation. , 1997, Genes & development.

[30]  Matthias W. Hentze,et al.  Increased efficiency of mRNA 3′ end formation: a new genetic mechanism contributing to hereditary thrombophilia , 2001, Nature Genetics.

[31]  D. Ecker,et al.  RNAMotif, an RNA secondary structure definition and search algorithm. , 2001, Nucleic acids research.

[32]  G. Shaw,et al.  A conserved AU sequence from the 3′ untranslated region of GM-CSF mRNA mediates selective mRNA degradation , 1986, Cell.

[33]  Gunnar Rätsch,et al.  Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.

[34]  C. Y. Chen,et al.  AU-rich elements: characterization and importance in mRNA degradation. , 1995, Trends in biochemical sciences.