论文信息 - Probabilistic Language Modelling

Probabilistic Language Modelling

Language models assign probabilities to strings of symbols. Their interpretation is reviewed and applied to text classification. A language recogniser is constructed from Bayes’ theorem and a simple bigram model. This provides near perfect results on sentences of text and motivates a mixture language model. Hidden Markov models (HMM) are reviewed as a method of capturing order over different length scales and used to construct a mixture model. This allows segmentation of text into unknown languages and the extraction of foreign words in known languages from English text. Future directions are discussed.

È ü ½ Ü ¾ Ü | Ü. Ae | A. Ò | È ü Ò Ü ½ Ü ¾ Ü ¿ Ü | È ü ½ Ü ¾ Ü | Ü Ae | Ae Ò | È ü Ò Ü ½ Ü ¾ Ü ¿ Ü

[1] Michael I. Jordan,et al. Factorial Hidden Markov Models , 1995, Machine Learning.

[2] Thomas P. Minka,et al. Bayesian model averaging is not model combination , 2002 .

[3] Stanley F. Chen,et al. Evaluation Metrics For Language Models , 1998 .

[4] Alan F. Blackwell,et al. Dasher—a data entry interface using continuous gestures and language models , 2000, UIST '00.

[5] Aaron D. Wyner,et al. Prediction and Entropy of Printed English , 1993 .

[6] Vittorio Loreto,et al. Language trees and zipping. , 2002, Physical review letters.

[7] Ian H. Witten,et al. Arithmetic coding revisited , 1998, TOIS.

[8] Claude E. Shannon,et al. Prediction and Entropy of Printed English , 1951 .

[9] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[10] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[11] Yoshua Bengio,et al. The Z-coder adaptive binary coder , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[12] Thomas M. Cover,et al. A convergent gambling estimate of the entropy of English , 1978, IEEE Trans. Inf. Theory.

[13] John G. Cleary,et al. The entropy of English using PPM-based models , 1996, Proceedings of Data Compression Conference - DCC '96.

[14] Olivier Cappé,et al. Ten years of HMMs , 2001 .

[15] L. Baum,et al. Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[16] David J. Ward,et al. Adaptive Computer Interfaces , 2001 .

[17] David J. C. MacKay,et al. Bayesian Interpolation , 1992, Neural Computation.

[18] David J. C. MacKay,et al. A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[19] T. Speed,et al. Biological Sequence Analysis , 1998 .

[20] Jeff A. Bilmes,et al. A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .