The extent of coarticulatory effects: Implications for models of speech recognition

The influence of adjacent contexts on the perception of a phoneme (coarticulation) is an important phenomenon to study. This is so because theories of perception differ in the importance they attribute to such coarticulatory effects. One class of theory which plays down the role of coarticulation is based on speech being matched against a set of stored templates. Such theories have been applied to recognition of speech by man and machine (Klatt, 1979). The idea behind these theories is that similar signals occur every time the same speech ,;egments are spoken. Thus speech could be matched against a set of templates corresponding to these units since the templates are invariantly related to the speech at this level. But if coarticulation extends across segments for which the templates are specified, the templates would not be invariant and a recognition system based on this principal might not work. Thus such template theories depend on coarticulation having a limited and specified extent. Klatt (1979) has been explicit about how long he thinks the segments would have to be about as long as a diphone (diphones are the part of the speech signal that occurs at the transition point between phonemes). The reason he takes this view is that he considers coarticulation effects to be limited to diphones. The perceptual extent of coarticulation needs to be ascertained in order to see whether the diphone is a satisfactory basis for template matching. If diphones are not a satisfactory basis, such a scheme could still work with bigger templates (that is, the size of the units could be increased until no coarticulation occurs across the units). However, the bigger the templates needed, the less efficient template theories are because of the greater sized unit that has to be stored and the increased number that would be needed. Thus, examining this question does not just serve to evaluate a particular scheme for recognition but also assesses the practical feasibility of template theorie,~ in general. The ways the perceptual extent of coarticulation has been assessed include seeing whether listeners can reliably identify if a vowel isolated from its consonant had been spoken in a nasal or non-nasal environment (Ali et. al., 1971 ) or whether classification of vowels is speeded when they are in their proper context rather than cross-spliced into another context (Martin & Bunnell, 1982). However. such studies do not show effects on the identifiability of phonemes so they do not indicate whether they would lead a recognition system based on template-matching to misidentify phonemes and, therefore, classify words incorrectly. In the experiments reported here, the general rationale is to determine the circumstances under which coarticulation of nasalization from a nasal consonant to adjacent non-nasal affects the identifiability of the vowel. If such effects occur over units longer than a diphone, bigger templates would be needed to implement a template-based recognition system. The language chosen for the tests was French because it has contrastive nasalization. Natural speech was used rather than synthetic speech since the latter already embodies assumptions about coarticulation. Native speakers of French were used because it is possible that they are more careful to control nasa[ coarticulation than speakers of languages where nasalization is not used contrastively. Two nalive French speakers were employed (one female and one male).

[1]  P Howell Velar Coarticulation in the Speech of French-English Bilinguals , 1981 .

[2]  R Daniloff,et al.  Perception of coarticulated nasality. , 1971, The Journal of the Acoustical Society of America.

[3]  J. G. Martin,et al.  Perception of anticipatory coarticulation effects. , 1981, The Journal of the Acoustical Society of America.

[4]  Dennis H. Klatt,et al.  Speech perception: a model of acoustic–phonetic analysis and lexical access , 1979 .