A comparison of learning techniques in speech recognition

Template-based recognition systems overcome errors in the short-term matching process by comparing whole sequences of acoustic events. In many vocabularies, each word has a highly distinctive sequence. Some vocabularies have confusable words with very similar sequences, leading to poor recognition performance. Improvements in discriminability among similar words may be achieved by altering the matching algorithm, or by improving the reference template set. Both techniques are instances of multi-exemplar learning techniques which improve recognition performance through automatic evaluation of training data. This paper examines several such techniques using isolated utterances and highly ambiguous vocabularies (e.g., the "E" set; 3 B C D E G P V T Z) in a speaker-dependent recognition system. A system which combined both featural and template information led to the best performance for six out of eight speakers. Using this technique, E-set error rates improved from 37% to 10%.