Composite Decision by Bayesian Inference in Distant-Talking Speech Recognition

This paper describes an integrated system to produce a composite recognition output on distant-talking speech when the recognition results from multiple microphone inputs are available In many cases, the composite recognition result has lower error rate than any other individual output In this work, the composite recognition result is obtained by applying Bayesian inference The log likelihood score is assumed to follow a Gaussian distribution, at least approximately First, the distribution of the likelihood score is estimated in the development set Then, the confidence interval for the likelihood score is used to remove unreliable microphone channels Finally, the area under the distribution between the likelihood score of a hypothesis and that of the (N+1)st hypothesis is obtained for every channel and integrated for all channels by Bayesian inference The proposed system shows considerable performance improvement compared with the result using an ordinary method by the summation of likelihoods as well as any of the recognition results of the channels.

[1]  Kazuya Takeda,et al.  Speech recognition based on space diversity using distributed multi-microphone , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  W. M. Bolstad Introduction to Bayesian Statistics , 2004 .

[3]  Hong-Seok Kim,et al.  Using a real-time, tracking microphone array as input to an HMM speech recognizer , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Satoshi Nakamura,et al.  Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array , 2002, IEEE Trans. Speech Audio Process..

[5]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[6]  Satoshi Nakamura,et al.  HMM-separation-based speech recognition for a distant moving speaker , 2001, IEEE Trans. Speech Audio Process..