论文信息 - Connected digit recognition experiments with the OGI Toolkit's neural network and HMM-based recognizers

Connected digit recognition experiments with the OGI Toolkit's neural network and HMM-based recognizers

This paper describes a series of experiments that compare different approaches to training a speaker-independent continuous-speech digit recognizer using the CSLU Toolkit. Comparisons are made between the hidden Markov model (HMM) and neural network (NN) approaches. In addition, a description of the CSLU Toolkit research environment is given. The CSLU Toolkit is a research and development software environment that provides a powerful and flexible tool for creating and using spoken language systems for telephone and PC applications. In particular, the CSLU-HMM, the CSLU-NN, and the CSLU-FBNN development environments, with which our experiments were implemented, are described in detail and recognition results are compared. Our speech corpus is OGI 30K-Numbers, which is a collection of spontaneous ordinal and cardinal numbers, continuous digit strings and isolated digit strings. The utterances were recorded by having a large number of people recite their ZIP code, street address, or other numeric information over the telephone. This corpus represents a very noisy and difficult recognition task. Our best results (98% word recognition, 92% sentence recognition), obtained with the FBNN architecture, suggest the effectiveness of the CSLU Toolkit in building real-life speech recognition systems.

[1] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2] John K. Ousterhout,et al. Tcl and the Tk Toolkit , 1994 .

[3] S. J. Young,et al. Tree-based state tying for high accuracy acoustic modelling , 1994 .

[4] R. Cole,et al. TELEPHONE SPEECH CORPUS DEVELOPMENT AT CSLU , 1998 .

[5] Yonghong Yan,et al. Speech recognition using neural networks with forward-backward probability generated targets , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[7] Ronald A. Cole,et al. An interactive environment for speech recognition research , 1992, ICSLP.

[8] Pieter J. E. Vermeulen,et al. CSLUsh: an extendible research environment , 1997, EUROSPEECH.