Identification of plant messenger RNA polyadenylation sites using length-variable second order Markov model

In this paper we adopted a length-variable second order Markov model to identify plant messenger RNA poly(A) sites, and provided a common method that only relies on the experimental sequences. The efficacy of our model is showed up to 92% sensitivity and 79% specificity. This method is particularly suitable for the prediction of the poly(A) site which is lack of biological priori knowledge and has poor conservative signal characteristic, as well as for the identification of the alternative poly(A) sites in different genetic regions. Compared with other algorithms, generalized hidden Markov model needed the signal distributions and AdaBoost required the construction of signal features around the sites, our model is more versatile.

[1]  Qingshun Quinn Li,et al.  Compilation of mRNA Polyadenylation Signals in Arabidopsis Revealed a New Signal Element and Potential Secondary Structures1[w] , 2005, Plant Physiology.

[2]  C. Joshi,et al.  Putative polyadenylation signals in nuclear genes of higher plants: a compilation and analysis. , 1987, Nucleic acids research.

[3]  Temple F. Smith,et al.  Probabilistic prediction of Saccharomyces cerevisiae mRNA 3'-processing sites. , 2002, Nucleic acids research.

[4]  Guoli Ji,et al.  An AdaBoost Algorithm for the Identification of Arabidopsis Messenger RNA Polyadenylation Sites , 2009, 2009 First International Conference on Information Science and Engineering.

[5]  Jing Li,et al.  Splice sites prediction of Human genome using length-variable Markov model and feature selection , 2010, Expert Syst. Appl..

[6]  Jing Zhao,et al.  Formation of mRNA 3′ Ends in Eukaryotes: Mechanism, Regulation, and Interrelationships with Other Steps in mRNA Synthesis , 1999, Microbiology and Molecular Biology Reviews.

[7]  Matthias W. Hentze,et al.  Increased efficiency of mRNA 3′ end formation: a new genetic mechanism contributing to hereditary thrombophilia , 2001, Nature Genetics.

[8]  Huiqing Liu,et al.  An in-silico method for prediction of polyadenylation signals in human sequences. , 2003, Genome informatics. International Conference on Genome Informatics.

[9]  Qingshun Quinn Li,et al.  Recognition of Plant mRNA Polyadenylation Sites Based on High-Dimensional Space Points' Covering Method , 2008, 2008 International Symposium on Information Science and Engineering.

[10]  Richard Durbin,et al.  A probabilistic model of 3' end formation in Caenorhabditis elegans. , 2004, Nucleic acids research.

[11]  Béatrice Conne,et al.  The 3′ untranslated region of messenger RNA: A molecular ‘hotspot’ for pathology? , 2000, Nature Medicine.

[12]  Robert M. Miura,et al.  Prediction of mRNA polyadenylation sites by support vector machine , 2006, Bioinform..

[13]  Xiaohui Wu,et al.  Predictive modeling of plant messenger RNA polyadenylation sites , 2007, BMC Bioinformatics.

[14]  B. Tian,et al.  Bioinformatic identification of candidate cis-regulatory elements involved in human mRNA polyadenylation. , 2005, RNA.

[15]  D. Gautheret,et al.  Patterns of variant polyadenylation signal usage in human genes. , 2000, Genome research.

[16]  Heleń M. Rothnie,et al.  Plant mRNA 3′-end formation , 1996, Plant Molecular Biology.

[17]  C R Cantor,et al.  Genomic detection of new yeast pre-mRNA 3'-end-processing signals. , 1999, Nucleic acids research.

[18]  G. Edwalds-Gilbert,et al.  Alternative poly(A) site selection in complex transcription units: means to an end? , 1997, Nucleic acids research.