Speaker-independent isolated word recognition for telephone voice using phoneme-like templates
暂无分享,去创建一个
This paper describes a speaker-independent isolated word recognition algorithm for telephone voice and its recognition performance. The recognition algorithm consists of two processes ; dynamic time warping and statistical word discrimination. In the first process, input speech is compared with each word template using the dynamic time warping technique. Multiple word templates are used to deal with speech variations among speakers, where each word template is represented by a sequence of phoneme-like templates. To attain high recognition ability, a new technique for generating word templates is proposed. In the second process, statistical word discrimination is carried out for word candidates which have relatively low reliability in the first process. Discrimination functions are calculated based on statistics of transition tendencies of speech characteristics between adjacent frames, and the final word decision is made. The system was trained using utterances from 1305 speakers and tested with utterances from 259 speakers. The average recognition rate of 96.5% was obtained for a 16-word Japanese vocabulary set.
[1] Kiyohiro Shikano,et al. Isolated word recognition using phoneme-like templates , 1983, ICASSP.
[2] Kiyohiro Shikano,et al. Speaker-independent isolated word recognition based on multiple templates using split method , 1985, Systems and Computers in Japan.
[3] Aaron E. Rosenberg,et al. Interactive clustering techniques for selecting speaker-independent reference templates for isolated word recognition , 1979 .
[4] Robert M. Gray,et al. An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..