Implementation of a classification-based prediction model for plant mRNA Poly(A) sites

The poly(A) site of a messenger RNA (mRNA) defines the end of a transcript during eukaryotic gene expression. Finding poly(A) sites in genome sequences can help to annotate the ends of genes and predict alternative polyadenylation. However, it is challenging to predict plant poly(A) sites using computational methods because of the weak signals that determine the poly(A) sites. Here we describe a classification based plant poly(A) site recognition model. First, several feature representation methods like factorial moments, M encoding, and weight of signal patterns are adopted to describe the makeup of nucleotide sequences of poly(A) signals. Then, a training model using different classification algorithms like Bayesian network is built as a testing model to predict plant mRNA poly(A) sites. Comparing to previous plant poly(A) sites prediction software PASS that we developed, the recognition model introduced here has better performance, flexibility and expansibility.