Robust speech detection method for telephone speech recognition system

This paper describes speech endpoint detection methods for continuous speech recognition systems used over telephone networks. Speech input to these systems may be contaminated not only by various ambient noises but also by various irrelevant sounds generated by users such as coughs, tongue clicking, lip noises and certain out-of-task utterances. Under these adverse conditions, robust speech endpoint detection remains an unsolved problem. We found in fact, that speech endpoint detection errors occurred in over 10% of the inputs in field trials of a voice activated telephone extension system. These errors were caused by problems of (1) low SNR, (2) long pauses between phrases and (3) irrelevant sounds prior to task sentences. To solve the first two problems, we propose a real-time speech ending point detection algorithm based on the implicit approach, which finds a sentence end by comparing the likelihood of a complete sentence hypothesis and other hypotheses. For the third problem, we propose a speech beginning point detection algorithm which rejects irrelevant sounds by using likelihood ratio and duration conditions. The effectiveness of these methods was evaluated under various conditions. As a result, we found that the ending point detection algorithm was not affected by long pauses and that the beginning point detection algorithm successfully rejected irrelevant sounds by using phone HMMs that fit the task. Furthermore, a garbage model of irrelevant sounds was also evaluated and we found that the garbage modeling technique and the proposed method compensated each other in their respective weak points and that the best recognition accuracy was achieved by integrating these methods.

[1]  T. Watanabe,et al.  Unknown utterance rejection using likelihood normalization based on syllable recognition , 1993 .

[2]  Shingo Kuroiwa,et al.  Error Analysis of Field Trial Results of a Spoken Dialogue System for Telecommunications Applications , 1995, IEICE Trans. Inf. Syst..

[3]  Richard Rose,et al.  Word Spotting from Continuous Speech Utterances , 1996 .

[4]  Jay G. Wilpon,et al.  Application of hidden Markov models to automatic speech endpoint detection , 1987 .

[5]  Jean-Claude Junqua,et al.  Robustness in Automatic Speech Recognition , 1996 .

[6]  Tanja Schultz,et al.  Acoustic and language modeling of human and nonhuman noises for human-to-human spontaneous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Ismael Cortázar,et al.  Current and experimental applications of speech technology for telecom services in Europe , 1997, Speech Commun..

[8]  Satoshi Takahashi,et al.  ASR and TTS telecommunications applications in Japan , 1997, Speech Commun..

[9]  Lawrence R. Rabiner,et al.  Applications of speech recognition in the area of telecommunications , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[10]  Aaron E. Rosenberg,et al.  An improved endpoint detector for isolated word recognition , 1981 .

[11]  Shingo Kuroiwa,et al.  Top-down speech detection and n-best meaning search in a voice activated telephone extension system , 1995, EUROSPEECH.

[12]  Biing-Hwang Juang,et al.  Filtering the time sequence of spectral parameters for speaker-independent CDHMM word recognition , 1995, EUROSPEECH.

[13]  Alex Acero,et al.  Robust HMM-based endpoint detector , 1993, EUROSPEECH.

[14]  Matthew Lennig,et al.  Directory assistance automation in Bell Canada: Trial results , 1995, Speech Commun..

[15]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[16]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[17]  Shigeru Katagiri,et al.  ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..

[18]  J. Takahashi,et al.  Phonetically adaptive cepstrum mean normalization for acoustic mismatch compensation , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[19]  R. C. Rose,et al.  Keyword detection in conversational speech utterances using hidden Markov model based continuous speech recognition , 1995, Comput. Speech Lang..