Speech recognition for an information kiosk

In the context of the ESPRIT MASK project one faces the problem of adapting a "state-of-the-art" laboratory speech recognizer for use in the real world with naive users. The speech recognizer is a software-only system that runs in real-time on a standard RISC processor. All aspects of the speech recognizer have been reconsidered from signal capture to adaptive acoustic models and language models. The resulting system includes such features as microphone selection, response cancellation, noise compensation, query rejection capability and decoding strategies for real-time recognition.

[1]  Robert Roth,et al.  A Rapid Match Algorithm for Continuous Speech Recognition , 1990, HLT.

[2]  Wolfgang Minker,et al.  A spoken language system for information retrieval , 1994, ICSLP.

[3]  Patrice Alexandre,et al.  Root cepstral analysis: A unified view. Application to speech processing in car noise environments , 1993, Speech Commun..

[4]  Lori Lamel,et al.  The LIMSI continuous speech dictation system: evaluation on the ARPA Wall Street Journal task , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Michael Picheny,et al.  A fast match for continuous speech recognition using allophonic models , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Jean-Luc Gauvain,et al.  Developments in continuous speech dictation using the 1995 ARPA NAB news task , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  L. F. Lamel,et al.  The Spoken Language Component of the Mask KioskJ , 1997 .

[8]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[9]  Mitch Weintraub,et al.  Large-vocabulary dictation using SRI's DECIPHER speech recognition system: progressive search techniques , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[11]  Jean-Luc Gauvain,et al.  Developments in continuous speech dictation using the ARPA WSJ task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Jean-Luc Gauvain,et al.  Development of spoken language corpora for travel information , 1995, EUROSPEECH.

[13]  Jean-Luc Gauvain,et al.  Continuous Speech Recognition at LIMSI , 1992 .

[14]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Lori Lamel,et al.  Data collection for the MASK kiosk: WOz vs. prototype system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.