Using the H-Divergence to Prune Probabilistic Automata

A problem usually encountered in probabilistic automata learning is the difficulty to deal with large training samples and/or wide alphabets. This is partially due to the size of the resulting Probabilistic Prefix Tree (PPT) from which state merging-based learning algorithms are generally applied. In this paper, we propose a novel method to prune PPTs by making use of the H-divergence d_H, recently introduced in the field of domain adaptation. d_H is based on the classification error made by an hypothesis learned from unlabeled examples drawn according to two distributions to compare. Through a thorough comparison with state-of-the-art divergence measures, we provide experimental evidences that demonstrate the efficiency of our method based on this simple and intuitive criterion.

[1]  Teemu Hirsimäki,et al.  On Growing and Pruning Kneser–Ney Smoothed $ N$-Gram Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Francisco Casacuberta,et al.  Machine Translation with Inferred Stochastic Finite-State Transducers , 2004, Computational Linguistics.

[3]  Baptiste Jeudy,et al.  Efficient Pruning of Probabilistic Automata , 2008, SSPR/SPR.

[4]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[5]  Dana Ron,et al.  On the learnability and usage of acyclic probabilistic finite automata , 1995, COLT '95.

[6]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[7]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[8]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[9]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[10]  Derrick Coetzee TinyLex: static n-gram index pruning with perfect recall , 2008, CIKM '08.

[11]  Franck Thollard Improving Probabilistic Grammatical Inference Core Algorithms with Post-processing Techniques , 2001, ICML.

[12]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[13]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[14]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.