Audio-visual sensor fusion system for intelligent sound sensing

An intelligent sensing system is proposed, which extracts a target sound signal autonomously from multi-microphone signals corrupted by interference ambient noise. Although many types of intelligent signal receivers with multiple sensors have been proposed recently, the use of audio-visual sensor fusion techniques is a special feature of the system described here. This sensor fusion system can be divided into two subsystems: an audio subsystem and a visual subsystem. The audio subsystem extracts a target signal with a digital filter composed of tapped delay lines and adjustable weights. These weights are renewed by a special adaptive algorithm, which is called the "cue signal method". For adaptation, the cue signal method needs only a narrow bandwidth signal which correlates with the power level of the target signal. This narrow bandwidth signal is called the "cue signal". The role of the visual subsystem is, therefore, to generate a cue signal. The authors have already proposed methods for generating a cue signal using video images. Sensor fusion of audio and visual information was accomplished by simple methods. In this paper, two new sensor fusion techniques are proposed. One is a method for generating a cue signal using not only video images but also microphone signals, and the other is a method for generating a cue signal using microphone signals, video images and internal knowledge. Both are a hierarchical sensor fusion of audio and visual information. In order to evaluate and demonstrate the sensor fusion algorithm, a real-time processing system including seventy DSPs was constructed. The architecture of this system is also described.<<ETX>>

[1]  B. Widrow,et al.  Adaptive noise cancelling: Principles and applications , 1975 .

[2]  E. Petajan,et al.  An improved automatic lipreading system to enhance speech recognition , 1988, CHI '88.

[3]  Juro Ohga,et al.  Adaptive microphone-array system for noise reduction , 1986, IEEE Trans. Acoust. Speech Signal Process..

[4]  H. Yamasaki,et al.  Advanced intelligent sensing system using sensor fusion , 1992, Proceedings of the 1992 International Conference on Industrial Electronics, Control, Instrumentation, and Automation.

[5]  Hiro Yamasaki,et al.  Self-adapting multiple microphone system , 1990 .

[6]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[7]  H. McGurk,et al.  Visual influences on speech perception processes , 1978, Perception & psychophysics.

[8]  B. Stein,et al.  Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. , 1986, Journal of neurophysiology.

[9]  Ren C. Luo,et al.  Multisensor integration and fusion in intelligent systems , 1989, IEEE Trans. Syst. Man Cybern..