Speech enhancement and recognition using circular microphone array for service robots

Speech recognition using circular microphone array is addressed in this paper. Eight microphones are located around the service robot to form a 2D microphone array. To enhance the speech quality, a novel adaptive beamformer composed of a delay-and-sum beamformer, adaptive blocking filters (ABFs) and adaptive cancelling filters (ACFs) is proposed. While the adaptive generalized sidelobe canceller (AGSC) connects the ABF and the ACF in feedforward, the proposed adaptive beamformer has them in feedback. The advantages of the proposed structure are the robustness to the steering vector errors and cross-talks and the reduced number of filter taps that gives the same speech quality compared to the AGSC with a larger number of filter taps. The experimental results show that the proposed structure is superior to the AGSC in objective and subjective evaluations. Speech recognition result shows that the proposed robust adaptive beamformer guarantees the recognition performance even in a low SNR and highly reverberant environment.

[1]  Changkyu Choi,et al.  SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION , 2001 .

[2]  Akihiko Sugiyama,et al.  A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters , 1999, IEEE Trans. Signal Process..

[3]  Nam Soo Kim,et al.  Spectral enhancement based on global soft decision , 2000, IEEE Signal Processing Letters.

[4]  Kiyohiro Shikano,et al.  Speech enhancement by multiple beamforming with reflection signal equalization , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Satoshi Nakamura,et al.  Speech enhancement based on the subspace method , 2000, IEEE Trans. Speech Audio Process..

[6]  Joseph B. Evans,et al.  A new adaptive noise cancellation scheme in the presence of crosstalk (speech signals) , 1992 .

[7]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .

[8]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[9]  Masataka Goto,et al.  Real-time sound source localization and separation system and its application to automatic speech recognition , 2001, INTERSPEECH.

[10]  Chungyong Lee,et al.  High-quality speech acquisition and recognition system for home-agent robot , 2003, 2003 IEEE International Conference on Consumer Electronics, 2003. ICCE..

[11]  P. R. Kumar,et al.  Stochastic parallel model adaptation: theory and applications to active noise canceling, feedforward control, IIR filtering, and identification , 1992 .

[12]  Hiroaki Kitano,et al.  Human-robot interaction through real-time auditory and visual multiple-talker tracking , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[13]  Hiroaki Kitano,et al.  Epipolar geometry based sound localization and extraction for humanoid audition , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[14]  B. Widrow,et al.  Adaptive noise cancelling: Principles and applications , 1975 .

[15]  O. L. Frost,et al.  An algorithm for linearly constrained adaptive array processing , 1972 .

[16]  Ehud Weinstein,et al.  Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..