Tagging English Text with a Probabilistic Model

In this paper we present some experiments on the use of a probabilistic model to tag English text, i.e. to assign to each word the correct tag (part of speech) in the context of the sentence. The main novelty of these experiments is the use of untagged text in the training of the model. We have used a simple triclass Markov model and are looking for the best way to estimate the parameters of this model, depending on the kind and amount of training data provided. Two approaches in particular are compared and combined:using text that has been tagged by hand and computing relative frequency counts,using text without tags and training the model as a hidden Markov process, according to a Maximum Likelihood principle.Experminents show that the best training is obtained by using as much tagged text as possible. They also show that Maximum Likelihood training, the procedure that is routinely used to estimate hidden Markov models parameters from training data, will not necessarily improve the tagging accuracy. In fact, it will generally degrade this accuracy, except when only a limited amount of hand-tagged text is available.

[1]  Robert F. Simmons,et al.  A Computational Approach to Grammatical Coding of English Words , 1963, JACM.

[2]  Percy H. Tannenbaum,et al.  Stochastic approach to the grammatical coding of english , 1965, CACM.

[3]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[4]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[5]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[6]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[7]  Benny Brodda,et al.  Problems with Tagging – and a Solution , 1982, Nordic Journal of Linguistics.

[8]  Geoffrey Leech,et al.  The Automatic Grammatical Tagging of the LOB Corpus , 1983 .

[9]  Ian Marshall,et al.  Choice of grammatical word-class without global syntactic analysis: Tagging words in the lob corpus , 1983, Comput. Humanit..

[10]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Andrew David Beale A Probabilistic Approach To Grammatical Analysis Of Writtin English By Computer , 1985, EACL.

[12]  Roger Garside,et al.  A Probabilistic Parser , 1985, EACL.

[13]  Bernard Mérialdo,et al.  Natural Language Modeling for Phoneme-to-Text Transcription , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[15]  Luciano Fissore,et al.  Experimental evaluation of Italian language models for large-dictionary speech recognition , 1987, ECST.

[16]  Steven J. DeRose,et al.  Grammatical Category Disambiguation by Statistical Optimization , 1988, CL.

[17]  K. Shikano,et al.  A study of English word category prediction based on neural networks , 1988 .

[18]  Andrew David Beale Lexicon and Grammar in Probabilistic Tagging of Written English , 1988, ACL.

[19]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[20]  James A. Anderson,et al.  Syntactic category disambiguation with neural networks , 1989 .

[21]  Masami Nakamura,et al.  A study of English word category prediction based on neutral networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[22]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1989, ANLP.

[23]  Eric Brill,et al.  Deducing Linguistic Structure from the Statistics of Large Corpora , 1990, HLT.

[24]  Carl de Marcken,et al.  Parsing the LOB Corpus , 1990, ACL.

[25]  Eric Brill,et al.  Deducing linguistic structure from the statistics of large corpora , 1990 .

[26]  B. Merialdo,et al.  Tagging text with a probabilistic model , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[27]  Hans Paulussen,et al.  DILEMMA-2: A Lemmatizer-Tagger For Medical Abstracts , 1992, Applied Natural Language Processing Conference.

[28]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.