论文信息 - Putting language into language modeling

Putting language into language modeling

In this paper we describe the statistical Structured Language Model (SLM) that uses grammatical analysis of the hypothesized sentence segment (prefix) to predict the next word. We first describe the operation of a basic, completely lexicalized SLM that builds up partial parses as it proceeds left to right. We then develop a chart parsing algorithm and with its help a method to compute the prediction probabilities P (wi+1jWi): We suggest useful computational shortcuts followed by a method of training SLM parameters from text data. Finally, we introduce more detailed parametrization that involves non-terminal labeling and considerably improves smoothing of SLM statistical parameters. We conclude by presenting certain recognition and perplexity results achieved on standard corpora.

Frederick Jelinek | Ciprian Chelba | F. Jelinek | Ciprian Chelba

[1] P MarcusMitchell,et al. Building a large annotated corpus of English , 1993 .

[2] Frederick Jelinek,et al. Recognition performance of a structured language model , 2000, EUROSPEECH.

[3] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[4] Daniel H. Younger,et al. Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[5] Frederick Jelinek,et al. Statistical methods for speech recognition , 1997 .

[6] Jun Wu,et al. Combining nonlocal, syntactic and n-gram dependencies in language modeling , 1999, EUROSPEECH.

[7] J. Baker. Trainable grammars for speech recognition , 1979 .

[8] Ronald Rosenfeld,et al. A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[9] Geoffrey Leech,et al. Running a grammar factory: The production of syntactically analysed corpora or “treebanks” , 1991 .

[10] Tadao Kasami,et al. An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[11] Jerome R. Bellegarda,et al. A latent semantic analysis framework for large-Span language modeling , 1997, EUROSPEECH.

[12] Frederick Jelinek,et al. Structured Language Modeling for Speech Recognition , 2000, ArXiv.

[13] Frederick Jelinek,et al. Exploiting Syntactic Structure for Language Modeling , 1998, ACL.