Analysis, synthesis, and perception of visible articulatory movements

Abstract: Significant advances in isolating the perceptually important properties of speech sounds followed the development of techniques for acoustical speech synthesis in the early 1950s. Prescription of spectro-temporal acoustical structure made it possible to manipulate individual acoustical parameters and thus to study the ways in which speech sounds were identified and discriminated. This paper reviews attempts to develop analogous facilities for studying the perception of visible, facial, articulatory movements in lipreading. A new approach, involving interrelated procedures for measuring, modelling, and animating displays of a talking face with computer graphics is described, along with a prototypic perceptual experiment. The results suggest that the important visible properties of point vowels may not be fully captured by descriptions of vertical jaw movements, horizontal and vertical oral opening and lip shape, even when vowels are spoken carefully. Additional cues, probably involving the visibility of the teeth and tongue tip appear to be required for accurate identification.