Probabilistic Language Modelling

Language models assign probabilities to strings of symbols. Their interpretation is reviewed and applied to text classification. A language recogniser is constructed from Bayes’ theorem and a simple bigram model. This provides near perfect results on sentences of text and motivates a mixture language model. Hidden Markov models (HMM) are reviewed as a method of capturing order over different length scales and used to construct a mixture model. This allows segmentation of text into unknown languages and the extraction of foreign words in known languages from English text. Future directions are discussed.

[1]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[2]  Thomas P. Minka,et al.  Bayesian model averaging is not model combination , 2002 .

[3]  Stanley F. Chen,et al.  Evaluation Metrics For Language Models , 1998 .

[4]  Alan F. Blackwell,et al.  Dasher—a data entry interface using continuous gestures and language models , 2000, UIST '00.

[5]  Aaron D. Wyner,et al.  Prediction and Entropy of Printed English , 1993 .

[6]  Vittorio Loreto,et al.  Language trees and zipping. , 2002, Physical review letters.

[7]  Ian H. Witten,et al.  Arithmetic coding revisited , 1998, TOIS.

[8]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[9]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[10]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[11]  Yoshua Bengio,et al.  The Z-coder adaptive binary coder , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[12]  Thomas M. Cover,et al.  A convergent gambling estimate of the entropy of English , 1978, IEEE Trans. Inf. Theory.

[13]  John G. Cleary,et al.  The entropy of English using PPM-based models , 1996, Proceedings of Data Compression Conference - DCC '96.

[14]  Olivier Cappé,et al.  Ten years of HMMs , 2001 .

[15]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[16]  David J. Ward,et al.  Adaptive Computer Interfaces , 2001 .

[17]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[18]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[19]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[20]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .