A hybrid neural network, dynamic programming word spotter

A novel keyword-spotting system that combines both neural network and dynamic programming techniques is presented. This system makes use of the strengths of time delay neural networks (TDNNs), which include strong generalization ability, potential for parallel implementations, robustness to noise, and time shift invariant learning. Dynamic programming models are used by this system because they have the useful capability of time warping input speech patterns. This system was trained and tested on the Stonehenge Road Rally database, which is a 20-keyword-vocabulary, speaker-independent, continuous-speech corpus. Currently, this system performs at a figure of merit (FOM) rate of 82.5%. FOM is the detection rate averaged from 0 to 10 false alarms per keyword hour. This measure is explained in detail.<<ETX>>

[1]  Richard Rose,et al.  A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Alex Waibel,et al.  Consonant recognition by modular construction of large phonemic time-delay neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  D. P. Morgan,et al.  Multiple neural network topologies applied to keyword spotting , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[6]  L. G. Miller,et al.  Improvements and applications for key word recognition using hidden Markov modeling techniques , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.