Stochastic segment models of eukaryotic promoter regions.

We present a new statistical approach for eukaryotic polymerase II promoter recognition. We apply stochastic segment models in which each state represents a functional part of the promoter. The segments are trained in an unsupervised way. We compare segment models with three and five states with our previous system which modeled the promoters as a whole, i.e. as a single state. Results on the classification of a representative collection of human and D. melanogaster promoter and non-promoter sequences show great improvements. The practical importance is demonstrated on the mining of large contiguous sequences.