Splice Site Prediction Using a Sparse Network of Winnows
暂无分享,去创建一个
A wide variety of biologically relevant signals are embedded within DNA sequences. Splice sites are one class of signals which determine the junctions between coding and non-coding regions of DNA, so splice site detection is a critical step in computational gene recognition. However, predicting splice sites is a challenging classification problem, mainly due to the overwhelming abundance of pseudo-sites and consequently, the relatively small number of positive examples. Models of splice signals based on complex feature spaces may be useful for increased recognition accuracy; however, the relative lack of data can pose a problem for many statistical learning methods in high dimensional feature spaces. We present a learning approach to donor and acceptor splice site prediction based on the SNoW architecture - a sparse network of classifiers implementing a variant of the multiplicative weight-update algorithm, Winnow, which is known to tolerate high dimensional feature spaces and to behave robustly in the presence of irrelevant or features. These two attributes, which enable a SNoW network to incorporate many different feature types, motivated our attempt to create a SNoW-based splice site predictor, where an assortment of features based on the local sequence context were used. Accuracy evaluation on several benchmark test sets of human genes indicates that SNoW-based splice site predictors compare favorably with other programs based on local sequence features. SNoW-based learning may be useful for other biological signal prediction tasks.