Evaluation and integration of neural-network training techniques for continuous digit recognition

This paper describes a set of experiments on neural-network training and search techniques that, when combined, have resulted in a 54% reduction in error on the continuous digits recognition task. The best system had word-level accuracy of 97.52% on a test set of the OGI 30K Numbers corpus, which contains naturally-produced continuous digit strings recorded over telephone channels. Experiments investigated effects of the feature set, the amount of data used for training, the type of context-dependent categories to be recognized, the values for duration limits, and the type of grammar. The experiments indicate that the grammar and duration limits had a greater effect on recognition accuracy than the output categories, cepstral features, or a 50% increase in the amount of training data.

[1]  Hervé Bourlard,et al.  A new approach towards keyword spotting , 1993, EUROSPEECH.

[2]  Ronald A. Cole,et al.  Connected digit recognition experiments with the OGI Toolkit's neural network and HMM-based recognizers , 1998, Proceedings 1998 IEEE 4th Workshop Interactive Voice Technology for Telecommunications Applications. IVTTA '98 (Cat. No.98TH8376).

[3]  Sarel van Vuuren,et al.  Improved neural network training of inter-word context units for connected digit recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  R. Cole,et al.  TELEPHONE SPEECH CORPUS DEVELOPMENT AT CSLU , 1998 .

[5]  J. Tebelskis,et al.  Speech recognition using neural networks , 1996 .

[6]  Yonghong Yan,et al.  Speech recognition using neural networks with forward-backward probability generated targets , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.