Speech and Language Processing: an Introduction to Speech Recognition, Computational Linguistics and Natural Language Processing. Daniel Jurafsky & 4 N-grams

But it must be recognized that the notion " probability of a sentence " is an entirely useless one, under any known interpretation of this term. Anytime a linguist leaves the group the recognition rate goes up. Radar O'Reilly, the mild-mannered clerk of the 4077th M*A*S*H unit, had an uncanny ability to guess what his interlocutor was about to say. Let's look at another kind of word prediction: what word is likely to follow this fragment? I'd like to make a collect.. . Hopefully most of you concluded that a very likely word is call, or international or phone, but probably not the. We will formalize this idea of word prediction by building probabilistic models of word sequences called N-grams, WORD PREDICTION which predict the next word from the previous N − 1 words. Such statistical models of word sequences are also called language models or LMs. Computing the LANGUAGE MODELS LM probability of the next word will turn out to be closely related to computing the probability of a sequence of words. The following sequence, for example, has a non-zero probability of being encountered in a text written in English:. .. all of a sudden I notice three guys standing on the sidewalk... while this same set of words in a different order has a very low probability: on guys all I of notice sidewalk three a sudden standing the 1 In an address to the first Workshop on the Evaluation of NLP Systems, Dec 7, 1988. The workshop is described in Palmer and Finin (1990) but the quote wasn't written down; some remember a more snappy version: Every time I fire a linguist the performance of the recognizer improves.

[1]  Paul R. Cohen,et al.  Empirical methods for artificial intelligence , 1995, IEEE Expert.

[2]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[3]  Alan F. Newell,et al.  The rôle of natural language processing in alternative and augmentative communication , 1998, Natural Language Engineering.

[4]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[5]  Thomas Niesler,et al.  Modelling word-pair relations in a category-based language model , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Marcus Kracht,et al.  The mathematics of language , 2003 .

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[9]  D. Foster Elegy by W.S.: A Study in Attribution , 1989 .

[10]  Mari Ostendorf,et al.  Transforming out-of-domain estimates to improve in-domain language models , 1997, EUROSPEECH.

[11]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[12]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[13]  J. Cleary,et al.  \self-organized Language Modeling for Speech Recognition". In , 2022 .

[14]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[15]  Jerome R. Bellegarda,et al.  Speech recognition experiments using multi-span statistical language models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[16]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[17]  Ronald Rosenfeld,et al.  Nonlinear interpolation of topic models for language model adaptation , 1998, ICSLP.

[18]  L. M. M.-T. Theory of Probability , 1929, Nature.

[19]  Guodong Zhou,et al.  MI-trigger-based Language Modelling , 1998, PACLIC.

[20]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[21]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[22]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[23]  Thomas M. Cover,et al.  A convergent gambling estimate of the entropy of English , 1978, IEEE Trans. Inf. Theory.

[24]  G. A. Miller,et al.  Finitary models of language users , 1963 .

[25]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[26]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[27]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[28]  A. Nadas,et al.  Estimation of probabilities in the language model of the IBM speech recognition system , 1984 .

[29]  Jun Wu,et al.  A maximum entropy language model integrating N-grams and topic dependencies for conversational speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[30]  I.N. Bozkurt,et al.  Authorship attribution , 2007, 2007 22nd international symposium on computer and information sciences.

[31]  David Yarowsky,et al.  Dynamic Nonlocal Language Modeling via Hierarchical Topic-Based Adaptation , 1999, ACL.

[32]  T. Cover,et al.  A sandwich proof of the Shannon-McMillan-Breiman theorem , 1988 .

[33]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[34]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  Andreas Stolcke,et al.  Statistical language modeling for speech disfluencies , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[36]  F. Mosteller,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[37]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[38]  Kenneth Ward Church,et al.  A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams , 1991 .

[39]  Eric Atwell,et al.  Large-scale lexical semantics for speech recognition support , 1997, EUROSPEECH.

[40]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Thomas Niesler,et al.  A variable-length category-based n-gram language model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[42]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[43]  Peter A. Heeman,et al.  POS Tags and Decision Trees for Language Modeling , 1999, EMNLP.

[44]  Daniel Jurafsky,et al.  Towards better integration of semantic predictors in statistical language modeling , 1998, ICSLP.

[45]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[46]  Ronald Rosenfeld,et al.  Topic adaptation for language modeling using unnormalized exponential models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[47]  Andreas Stolcke,et al.  The berkeley restaurant project , 1994, ICSLP.

[48]  Timothy W. Finin,et al.  Workshop on the Evaluation of Natural Language Processing Systems , 1990, Comput. Linguistics.

[49]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[50]  G. A. Miller,et al.  Verbal context and the recall of meaningful material. , 1950, The American journal of psychology.

[51]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Reinhard Kneser,et al.  Statistical language modeling using a variable context length , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[53]  Kenneth Ward Church,et al.  - 1-What ’ s Wrong with Adding One ? , 1994 .

[54]  Aaron D. Wyner,et al.  Prediction and Entropy of Printed English , 1993 .

[55]  Geoffrey Sampson Evolutionary Language Understanding , 1996 .