Statistical language modeling with a class-basedn-multigram model

In this paper, we present a stochastic language-modeling tool which aims at retrieving variable-length phrases (multigrams), assuming n -gram dependencies between them, hence the name of the model: n -multigram. The estimation of the probability distribution of the phrases is intermixed with a phrase-clustering procedure in a way which jointly optimizes the likelihood of the data. As a result, the language data are iteratively structured at both a paradigmatic and a syntagmatic level in a fully integrated way. We evaluate the 2-multigram model as a statistical language model on ATIS, a task-oriented database consisting of air travel reservations. Experiments show that the 2-multigram model allows a reduction of 10% of the word error rate on ATIS with respect to the usual trigram model, with 25% fewer parameters than in the trigram model. In addition, we illustrate the ability of this model to merge semantically related phrases of different lengths into a common class.

[1]  Mari Ostendorf,et al.  Variable n-grams and extensions for conversational speech language modeling , 2000, IEEE Trans. Speech Audio Process..

[2]  Frédéric Bimbot,et al.  Introducing statistical dependencies and structural constraints in variable-length sequence models , 1996, ICGI.

[3]  Yoshinori Sagisaka,et al.  Variable-order N-gram generation by word-class splitting and consecutive word grouping , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[6]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[7]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[8]  Jianying Hu,et al.  Language modeling using stochastic automata with variable length contexts , 1997, Comput. Speech Lang..

[9]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[10]  R. Pieraccini,et al.  Variable-length sequence modeling: multigrams , 1995, IEEE Signal Processing Letters.

[11]  Jan Robin Rohlicek,et al.  Statistical language modeling combining N-gram and context-free grammars , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Roberto Pieraccini,et al.  Stochastic automata for language modeling , 1996, Comput. Speech Lang..

[13]  Thomas Niesler,et al.  Variable-length categoryn-gram language models , 1999, Comput. Speech Lang..