论文信息 - The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

Prediction suffix trees (PST) provide a popular and effective tool for tasks such as compression, classification, and language modeling. In this paper we take a decision theoretic view of PSTs for the task of sequence prediction. Generalizing the notion of margin to PSTs, we present an online PST learning algorithm and derive a loss bound for it. The depth of the PST generated by this algorithm scales linearly with the length of the input. We then describe a self-bounded enhancement of our learning algorithm which automatically grows a bounded-depth PST. We also prove an analogous mistake-bound for the self-bounded algorithm. The result is an efficient algorithm that neither relies on a-priori assumptions on the shape or maximal depth of the target PST nor does it require any parameters. To our knowledge, this is the first provably-correct PST learning algorithm which generates a bounded-depth PST while being competitive with any fixed PST determined in hindsight.

[1] Alberto Apostolico,et al. Optimal amnesic probabilistic automata or how to learn and classify proteins in linear time and space , 2000, RECOMB '00.

[2] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[3] Yoram Singer,et al. Large margin hierarchical classification , 2004, ICML.

[4] Yishay Mansour,et al. A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization , 1998, ICML.

[5] Claudio Gentile,et al. Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[6] Koby Crammer,et al. Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[7] Y. Mukaigawa,et al. Large Deviations Estimates for Some Non-local Equations I. Fast Decaying Kernels and Explicit Bounds , 2022 .

[8] Salvatore J. Stolfo,et al. Sparse sequence modeling with applications to computational biology and intrusion detection , 2002 .

[9] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[10] P. Bühlmann,et al. Variable Length Markov Chains: Methodology, Computing, and Software , 2004 .

[11] Yoram Singer,et al. An Efficient Extension to Mixture Techniques for Prediction and Decision Trees , 1997, COLT '97.

[12] Alberto Apostolico,et al. Optimal Amnesic Probabilistic Automata or How to Learn and Classify Proteins in Linear Time and Space , 2000, J. Comput. Biol..

[13] Nello Cristianini,et al. An introduction to Support Vector Machines , 2000 .

[14] Dana Ron,et al. The power of amnesia: Learning probabilistic automata with variable memory length , 1996, Machine Learning.

[15] Frans M. J. Willems,et al. The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[16] Robert E. Schapire,et al. Predicting Nearly as Well as the Best Pruning of a Decision Tree , 1995, COLT.