Continuous speech recognition with the connectionist Viterbi training procedure: a summary of recent work

Various means by which hidden Markov models (HMMs) and neural networks (NNs) can be combined for continuous speech recognition are studied. The authors describe the connectionist Viterbi training (CVT) procedure, discuss the factors most important to its design, and report its recognition performance. Several changes made to the system are reported, including: (1) the change from recurrent to non-recurrent NNs, (2) the change from Sphinx-style phone-based HMMs to word-based HMMs, (3) the addition of a corrective training procedure, and (4) the addition of an alternate model for every word. The CVT system incorporating these changes achieved 99.1% word accuracy and 98.0% string accuracy on the TI/NBS connected digits task.<<ETX>>