Hands-free speech recognition based on 3-D Viterbi search using a microphone array

A microphone array is a promising solution for realizing hands-free speech recognition in real environments. Accurate talker localization is very important for speech recognition using a microphone array. However localization of a moving talker is difficult in noisy reverberant environments. Talker localization errors degrade the performance of speech recognition. To solve the problem, this paper proposes a new speech recognition algorithm which considers multiple talker direction hypotheses simultaneously. The proposed algorithm performs a Viterbi search in 3-dimensional trellis space composed of talker directions, input frames, and HMM states. As a result, a locus of the talker and a phoneme sequence of the speech are obtained by finding an optimal path with the highest likelihood. To evaluate the performance of the proposed algorithm, speech recognition experiments are carried out on simulated data and real environment data. These results show that the proposed algorithm works well even if the talker moves.

[1]  Maurizio Omologo,et al.  Acoustic source location in noisy and reverberant environment using CSP analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  Harvey F. Silverman,et al.  A two-stage algorithm for determining talker location from linear microphone array data , 1992 .

[3]  Maurizio Omologo,et al.  Experiments of speech recognition in a noisy and reverberant environment using a microphone array and HMM adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[5]  Satoshi Nakamura,et al.  Robust speech recognition with speaker localization by a microphone array , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.