The Power of Amnesia
暂无分享,去创建一个
We propose a learning algorithm for a variable memory length Markov process. Human communication, whether given as text, handwriting, or speech, has multi characteristic time scales. On short scales it is characterized mostly by the dynamics that generate the process, whereas on large scales, more syntactic and semantic information is carried. For that reason the conventionally used fixed memory Markov models cannot capture effectively the complexity of such structures. On the other hand using long memory models uniformly is not practical even for as short memory as four. The algorithm we propose is based on minimizing the statistical prediction error by extending the memory, or state length, adaptively, until the total prediction error is sufficiently small. We demonstrate the algorithm by learning the structure of natural English text and applying the learned model to the correction of corrupted text. Using less than 3000 states the model's performance is far superior to that of fixed memory models with similar number of states. We also show how the algorithm can be applied to intergenic E. coli DNA base prediction with results comparable to HMM based methods.
[1] Glen G. Langdon,et al. Universal modeling and coding , 1981, IEEE Trans. Inf. Theory.
[2] A. Nadas,et al. Estimation of probabilities in the language model of the IBM speech recognition system , 1984 .
[3] J. Rissanen. Stochastic Complexity and Modeling , 1986 .
[4] Ronitt Rubinfeld,et al. Efficient learning of typical finite automata from random walks , 1993, STOC.
[5] D. Haussler,et al. A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.