Connected digit recognition experiments with the OGI Toolkit's neural network and HMM-based recognizers

This paper describes a series of experiments that compare different approaches to training a speaker-independent continuous-speech digit recognizer using the CSLU Toolkit. Comparisons are made between the hidden Markov model (HMM) and neural network (NN) approaches. In addition, a description of the CSLU Toolkit research environment is given. The CSLU Toolkit is a research and development software environment that provides a powerful and flexible tool for creating and using spoken language systems for telephone and PC applications. In particular, the CSLU-HMM, the CSLU-NN, and the CSLU-FBNN development environments, with which our experiments were implemented, are described in detail and recognition results are compared. Our speech corpus is OGI 30K-Numbers, which is a collection of spontaneous ordinal and cardinal numbers, continuous digit strings and isolated digit strings. The utterances were recorded by having a large number of people recite their ZIP code, street address, or other numeric information over the telephone. This corpus represents a very noisy and difficult recognition task. Our best results (98% word recognition, 92% sentence recognition), obtained with the FBNN architecture, suggest the effectiveness of the CSLU Toolkit in building real-life speech recognition systems.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  John K. Ousterhout,et al.  Tcl and the Tk Toolkit , 1994 .

[3]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[4]  R. Cole,et al.  TELEPHONE SPEECH CORPUS DEVELOPMENT AT CSLU , 1998 .

[5]  Yonghong Yan,et al.  Speech recognition using neural networks with forward-backward probability generated targets , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[7]  Ronald A. Cole,et al.  An interactive environment for speech recognition research , 1992, ICSLP.

[8]  Pieter J. E. Vermeulen,et al.  CSLUsh: an extendible research environment , 1997, EUROSPEECH.