论文信息 - Using the H-Divergence to Prune Probabilistic Automata

Using the H-Divergence to Prune Probabilistic Automata

A problem usually encountered in probabilistic automata learning is the difficulty to deal with large training samples and/or wide alphabets. This is partially due to the size of the resulting Probabilistic Prefix Tree (PPT) from which state merging-based learning algorithms are generally applied. In this paper, we propose a novel method to prune PPTs by making use of the H-divergence d_H, recently introduced in the field of domain adaptation. d_H is based on the classification error made by an hypothesis learned from unlabeled examples drawn according to two distributions to compare. Through a thorough comparison with state-of-the-art divergence measures, we provide experimental evidences that demonstrate the efficiency of our method based on this simple and intuitive criterion.

[1] Teemu Hirsimäki,et al. On Growing and Pruning Kneser–Ney Smoothed $ N$-Gram Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Francisco Casacuberta,et al. Machine Translation with Inferred Stochastic Finite-State Transducers , 2004, Computational Linguistics.

[3] Baptiste Jeudy,et al. Efficient Pruning of Probabilistic Automata , 2008, SSPR/SPR.

[4] Koby Crammer,et al. A theory of learning from different domains , 2010, Machine Learning.

[5] Dana Ron,et al. On the learnability and usage of acyclic probabilistic finite automata , 1995, COLT '95.

[6] Lorenzo Rosasco,et al. Are Loss Functions All the Same? , 2004, Neural Computation.

[7] R. A. Leibler,et al. On Information and Sufficiency , 1951 .

[8] Andreas Stolcke,et al. Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[9] José Oncina,et al. Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[10] Derrick Coetzee. TinyLex: static n-gram index pruning with perfect recall , 2008, CIKM '08.

[11] Franck Thollard. Improving Probabilistic Grammatical Inference Core Algorithms with Post-processing Techniques , 2001, ICML.

[12] Koby Crammer,et al. Analysis of Representations for Domain Adaptation , 2006, NIPS.

[13] Ronitt Rubinfeld,et al. Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[14] Shai Ben-David,et al. Detecting Change in Data Streams , 2004, VLDB.