On the active perception of speech by robots

We describe an autonomous agent approach to automatic speech recognition which is based on the link of two models: a virtual environment model (VEM) and a virtual speaker model (VSM). The VEM is a system which can generate some synthetic signals of different wave lengths and can record real world data from a camera and a microphone. The VSM is a speech synthesis model with some controllable parameters which can be used to synthesize speech signal which varies according to the characteristics of an unknown speaker. VEM and VSM are instantiated to train artificial neural networks which extract and integrate the auditory and the visual information paths for the purpose of robust automatic speech recognition. Such an instance is called an autonomous speech recognition agent (ASRA) or equivalently a speech robot. Finally, the problem of robust automatic speech recognition in this new framework amounts to select the best ASRA for a given pair of VEM and VSM. The paper describes the simulation environment and presents the potential applications of this new model in the framework of data fusion, of ASRAs evaluation and of emerging properties of auto-adaptive systems.

[1]  Q. Summerfield Audio-visual Speech Perception, Lipreading and Artificial Stimulation , 1983 .

[2]  F. Varela Principles of biological autonomy , 1979 .

[3]  James H. Bradford,et al.  The human factors of speech-based interfaces: a research agenda , 1995, SGCH.

[4]  Jean-Claude Junqua,et al.  Acoustic and production pilot studies of speech vowels produced in noise , 1992, ICSLP.

[5]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[6]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[7]  Alan C. Bovik,et al.  Computer lipreading for improved accuracy in automatic speech recognition , 1996, IEEE Trans. Speech Audio Process..

[8]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..

[9]  Martin Drews Time delay estimation for microphone array speech enhancement systems , 1995, EUROSPEECH.

[10]  Randall D. Beer,et al.  A Biological Perspective Agent Design o n Autonomous 169 , 2022 .

[11]  Andrew Varga,et al.  Control experiments on noise compensation in hidden Markov model based continuous word recognition , 1989, EUROSPEECH.

[12]  B. Stein,et al.  The Merging of the Senses , 1993 .

[13]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[14]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[15]  Hermann Ney,et al.  Continuous mixture densities and linear discriminant analysis for improved context-dependent acoustic models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  T. Michael Knasel,et al.  Robotics and autonomous systems , 1988, Robotics Auton. Syst..

[17]  Maurizio Omologo,et al.  Talker localization and speech recognition using a microphone array and a cross-powerspectrum phase analysis , 1994, ICSLP.

[18]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[19]  Rodney A. Brooks,et al.  Artificial Life and Real Robots , 1992 .

[20]  Bruce J. MacLennan,et al.  Synthetic Ethology and the Evolution of Cooperative Communication , 1993, Adapt. Behav..

[21]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[22]  Q. Summerfield Some preliminaries to a comprehensive account of audio-visual speech perception. , 1987 .

[23]  Thomas S. Ray,et al.  Is It Alive or Is It GA? , 1991, ICGA.

[24]  David H. Ackley,et al.  Interactions between learning and evolution , 1991 .

[25]  B.P. Yuhas,et al.  Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.

[26]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[27]  Richard K. Belew,et al.  Artificial life: a constructive lower bound for artificial intelligence , 1991, IEEE Expert.