A model for context effects in speech recognition.

A model is presented that quantifies the effect of context on speech recognition. In this model, a speech stimulus is considered as a concatenation of a number of equivalent elements (e.g., phonemes constituting a word). The model employs probabilities that individual elements are recognized and chances that missed elements are guessed using contextual information. Predictions are given of the probability that the entire stimulus, or part of it, is reproduced correctly. The model can be applied to both speech recognition and visual recognition of printed text. It has been verified with data obtained with syllables of the consonant-vowel-consonant (CVC) type presented near the reception threshold in quiet and in noise, with the results of an experiment using orthographic presentation of incomplete CVC syllables and with results of word counts in a CVC lexicon. A remarkable outcome of the analysis is that the cues which occur only in spoken language (e.g., coarticulatory cues) seem to have a much greater influence on recognition performance when the stimuli are presented near the threshold in noise than when they are presented near the absolute threshold. Demonstrations are given of further predictions provided by the model: word recognition as a function of signal-to-noise ratio, closed-set word recognition, recognition of interrupted speech, and sentence recognition.