Converting speech into lip movements: a multimedia telephone for hard of hearing people

Presents the latest results of a research activity oriented to the development of a multimedia telephone for hard of hearing persons, based on the conversion of speech into graphic animation suitable to lipreading. The approach followed aims at the experimentation of a pilot telematic service for interpersonal communication, thus enlarging and improving the possibilities of social integration for hearing-impaired people. A preliminary prototype of the multimedia telephone has been integrated in a software demonstrator implemented on a Silicon Graphics workstation and is currently experimented in cooperation with FIADDA, the Italian Association of the parents of hearing-impaired children, to prove the feasibility of the system and the concrete possibility of providing new relay mediation services oriented to multimedia interpersonal communication for hard of hearing users. The developed algorithms rely on advanced methodologies in the field of nonlinear signal processing through neural network architectures, geometric modeling, computer graphics, and animation. Experimental results, although still preliminary, are encouraging and prove the system's feasibility. >

[1]  B.P. Yuhas,et al.  Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.

[2]  Kiyoharu Aizawa,et al.  Model-based analysis synthesis image coding (MBASIC) system for a person's face , 1989, Signal Process. Image Commun..

[3]  Fabio Lavagetto,et al.  Synthesis and animation of human faces: artificial reality in interpersonal video communication , 1993, Modeling in Computer Graphics.

[4]  Giancarlo Ferrigno,et al.  Articulatory dynamics of lips in Italian /'vpv/ and /'vbv/ sequences , 1993, EUROSPEECH.

[5]  N. P. Erber,et al.  Auditory, visual, and auditory-visual recognition of consonants by children with normal and impaired hearing. , 1972, Journal of speech and hearing research.

[6]  Sieb G. Nooteboom,et al.  The target theory of speech production , 1970 .

[7]  F. Lavagetto,et al.  Lipreadable frame animation driven by speech parameters , 1994, Proceedings of ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks.

[8]  Giancarlo Ferrigno,et al.  Automatic analysis of lips and jaw kinematics in VCV sequences , 1989, EUROSPEECH.

[9]  James Lubker Representation and Context Sensitivity , 1981 .

[10]  Raymond D. Kent,et al.  Coarticulation in recent speech production models , 1977 .

[11]  P. MacNeilage Motor control of serial ordering of speech. , 1970, Psychological review.

[12]  Ben Pinkowski LPC spectral moments for clustering acoustic transients , 1993, IEEE Trans. Speech Audio Process..

[13]  Harry Hollien,et al.  A Neural Model for Language and Speech. , 1978 .

[14]  P. Ladefoged WHAT ARE LINGUISTIC SOUNDS MADE OF , 1980 .

[15]  M. Halle,et al.  Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates , 1961 .

[16]  Joseph S. Perkell,et al.  On the Use of Feedback in Speech Production , 1981 .

[17]  K. Lashley The problem of serial order in behavior , 1951 .

[18]  M. Pichora-Fuller,et al.  Coarticulation effects in lipreading. , 1982, Journal of speech and hearing research.

[19]  Hiroshi Harashima,et al.  Model-based/waveform hybrid coding for videotelephone images , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[20]  A. Mlcoch,et al.  Speech Production Models as Related to the Concept of Apraxia of Speech , 1980 .

[21]  Parke,et al.  Parameterized Models for Facial Animation , 1982, IEEE Computer Graphics and Applications.

[22]  J. Abbs,et al.  Lip and Jaw Motor Control during Speech: Motor Reorganization Responses to External Interference , 1974 .

[23]  Osamu Fujimura Elementary Gestures and Temporal Organization — What Does an Articulatory Constraint Mean? , 1981 .

[24]  Q Summerfield,et al.  Use of Visual Information for Phonetic Perception , 1979, Phonetica.

[25]  E. Owens,et al.  Visemes observed by hearing-impaired and normal-hearing adult viewers. , 1985, Journal of speech and hearing research.

[26]  Hiroshi Harashima,et al.  A Media Conversion from Speech to Facial Image for Intelligent Man-Machine Interface , 1991, IEEE J. Sel. Areas Commun..

[27]  James H. Abbs,et al.  chapter 5 – Peripheral Mechanisms of Speech Motor Control , 1976 .

[28]  Eric A. Wan,et al.  Temporal backpropagation for FIR neural networks , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[29]  Carol A. Fowler,et al.  Coarticulation and theories of extrinsic timing , 1980 .

[30]  R. Hammarberg The metaphysics of coarticulation , 1976 .

[31]  Wayne A. Wickelgran Context-sensitive coding, associative memory, and serial order in (speech) behavior. , 1969 .

[32]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[33]  S. Öhman Numerical Model of Coarticulation , 1967 .