论文信息 - A comparison of learning techniques in speech recognition

A comparison of learning techniques in speech recognition

Template-based recognition systems overcome errors in the short-term matching process by comparing whole sequences of acoustic events. In many vocabularies, each word has a highly distinctive sequence. Some vocabularies have confusable words with very similar sequences, leading to poor recognition performance. Improvements in discriminability among similar words may be achieved by altering the matching algorithm, or by improving the reference template set. Both techniques are instances of multi-exemplar learning techniques which improve recognition performance through automatic evaluation of training data. This paper examines several such techniques using isolated utterances and highly ambiguous vocabularies (e.g., the "E" set; 3 B C D E G P V T Z) in a speaker-dependent recognition system. A system which combined both featural and template information led to the best performance for six out of eight speakers. Using this technique, E-set error rates improved from 37% to 10%.

[1] F. Itakura,et al. Minimum prediction residual principle applied to speech recognition , 1975 .

[2] A. E. Rosenberg,et al. Evaluation of an automatic word recognition system over dialed‐up telephone lines , 1976 .

[3] Harvey F. Silverman,et al. What are the significant variables in dynamic programming for discrete utterance recognition? , 1981, ICASSP.

[4] Aaron E. Rosenberg,et al. Speaker independent recognition of isolated words using clustering techniques , 1979, ICASSP.

[5] Lawrence R. Rabiner,et al. Isolated word recognition using a two-pass pattern recognition approach , 1981, ICASSP.

[6] L. Rabiner,et al. A simplified, robust training procedure for speaker trained, isolated word recognition systems , 1980 .

[7] F. Alleva,et al. Effect of Reference Set Selection on Speaker Dependent Speech Recognition. Frame Compression in Isolated Word Recognition , 1981 .

[8] Alex Waibel,et al. Comparative study of nonlinear time warping techniques in isolated word speech recognition systems , 1983 .

[9] T.B. Martin,et al. Practical applications of voice input to machines , 1976, Proceedings of the IEEE.

[10] J. C. Steinberg,et al. Factors Governing the Intelligibility of Speech Sounds , 1945 .