Improvement of speech recognition performance for spoken-oriented robot dialog system using end-fire array

In this paper, we propose a microphone array structure for a spoken-oriented robot dialog system that is designed to discriminate the direction of arrival (DOA) of the target speech and that of the robot internal noise. First, we investigate the performance of the noise estimation conducted by semi-blind source separation (SBSS) in presence of both the diffuse background noise and the robot internal noise. The result indicates that the noise estimation of the SBSS is not good. Next, we analyze the DOA of the robot internal noise in order to determine the reason of the above result; we find out that the internal noise is always in-phase at the microphone array and overlap spacial with the target speech. Based on this fact, we propose to change the microphone array structure from the broadside array to the end-fire array in order to discriminate the DOAs of the target speech and the internal noise. Finally, we evaluate the word accuracy in a dictation task in presence of both diffuse background noise and robot internal noise to confirm the advantage of the proposed structure. Simulation results shows that the proposed microphone array structure results in approximately 10% improvement of the speech recognition performance.

[1]  Kiyohiro Shikano,et al.  Blind Spatial Subtraction Array for Speech Enhancement in Noisy Environment , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[4]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[5]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[6]  Kiyohiro Shikano,et al.  Speech enhancement in car environment using blind source separation , 2002, INTERSPEECH.

[7]  Kiyohiro Shikano,et al.  Blind Source Separation Combining Independent Component Analysis and Beamforming , 2003, EURASIP J. Adv. Signal Process..

[8]  Kiyohiro Shikano,et al.  Semi-blind suppression of internal noise for hands-free robot spoken dialog system , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Nikos A. Vlassis,et al.  Efficient source adaptivity in independent component analysis , 2001, IEEE Trans. Neural Networks.

[10]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[11]  Kiyohiro Shikano,et al.  An improved permutation solver for blind signal separation based front-ends in robot audition , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Kiyohiro Shikano,et al.  A new phonetic tied-mixture model for efficient decoding , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13]  Kiyohiro Shikano,et al.  Frequency domain semi-blind signal separation: application to the rejection of internal noises , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.