论文信息 - On-Line Cumulative Learning of Hierarchical Sparse -grams

On-Line Cumulative Learning of Hierarchical Sparse -grams

We present a system for on-line, cumulative learning of hierarchical collections of frequent patterns from unsegmented data streams. Such learning is critical for long-lived intelligent agents in complex worlds. Learned patterns enable prediction of unseen data and serve as building blocks for higher-level knowledge representation. We introduce a novel sparse -gram model that, unlike pruned -grams, learns on-line by stochastic search for frequent -tuple patterns. Adding patterns as data arrives complicates probability calculations. We discuss an EM approach to this problem and introduce hierarchical sparse -grams, a model that uses a better solution based on a new method for combining information across levels. A second new method for combining information from multiple granularities ( -gram widths) enables these models to more effectively search for frequent patterns (an on-line, stochastic analog of pruning in association rule mining). The result is an example of a rare combination—unsupervised, on-line, cumulative, structure learning. Unlike prediction suffix tree (PST) mixtures, the model learns with no size bound but using less space than the data. It does not repeatedly iterate over data (unlike MaxEnt feature construction). It discovers repeated structure on-line and (unlike PSTs) uses this to learn larger patterns. The type of repeated structure is limited (e.g., compared to hierarchical HMMs) but still useful, and these are important first steps towards learning repeated structure in more expressive representations, which has seen little progress especially in unsupervised, on-line contexts.

Karl Pfleger | Karl Pfleger

[1] Ian H. Witten,et al. Text Compression , 1990, 125 Problems in Text Algorithms.

[2] Andrew W. Moore,et al. Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets , 1998, J. Artif. Intell. Res..

[3] Paul E. Utgoff,et al. Many-Layered Learning , 2002, Neural Computation.

[4] Andrew W. Moore,et al. Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[5] Jianfeng Gao,et al. Language model size reduction by pruning and clustering , 2000, INTERSPEECH.

[6] Richard Fikes,et al. On-line learning of predictive compositional hierarchies , 2002 .

[7] G. A. Miller. THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[8] Geoff Hulten,et al. Mining high-speed data streams , 2000, KDD '00.

[9] Yoram Singer,et al. Beyond Word N-Grams , 1996, VLC@ACL.

[10] James L. McClelland,et al. Autonomous Mental Development by Robots and Animals , 2001, Science.

[11] Ian H. Witten,et al. Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..