Speechreading and the structure of the lexicon: computationally modeling the effects of reduced phonetic distinctiveness on lexical uniqueness.

A lexical modeling methodology was employed to examine how the distribution of phonemic patterns in the lexicon constrains lexical equivalence under conditions of reduced phonetic distinctiveness experienced by speech-readers. The technique involved (1) selection of a phonemically transcribed machine-readable lexical database, (2) definition of transcription rules based on measures of phonetic similarity, (3) application of the transcription rules to a lexical database and formation of lexical equivalence classes, and (4) computation of three metrics to examine the transcribed lexicon. The metric percent words unique demonstrated that the distribution of words in the language substantially preserves lexical uniqueness across a wide range in the number of potentially available phonemic distinctions. Expected class size demonstrated that if at least 12 phonemic equivalence classes were available, any given word would be highly similar to only a few other words. Percent information extracted (PIE) [D. Carter, Comput. Speech Lang. 2, 1-11 (1987)] provided evidence that high-frequency words tend not to reside in the same lexical equivalence classes as other high-frequency words. The steepness of the functions obtained for each metric shows that small increments in the number of visually perceptible phonemic distinctions can result in substantial changes in lexical uniqueness.