Far-field continuous speech recognition system based on speaker Localization and sub-band Beamforming

This paper proposes a distant speech recognition system based on a novel speaker localization and beamforming (SRLB) algorithm. To localize the speaker an algorithm based on steered response power by utilizing harmonic structures of speech signal is proposed. This new scheme has the ability of speaker verification by fundamental frequency variation; therefore it can be utilized in the design of a speech recognition system for verified speakers. Then the performance of the Farsi speech recognition engine is evaluated under notorious conditions of noise and reverberation. Simulation results and tests on real data shows that by utilizing proposed localization scheme, recognition accuracy improves by 28% in high noise and reverberant conditions compared to the accuracy of single channel recognition. The capability of this algorithm in localizing a verified speaker improves system robustness to speech noises and reduces recognition errors up to %52 in the presence of speech noise.

[1]  Hossein Sameti,et al.  Speaker Direction Finding for Practical Systems: A Comparison of Different Approaches , 2007 .

[2]  W. Tager Near field superdirectivity (NFSD) , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[4]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[6]  Hong-Seok Kim,et al.  Performance of an HMM speech recognizer using a real-time tracking microphone array as input , 1999, IEEE Trans. Speech Audio Process..

[7]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .