100000-word recognition using acoustic-segment networks
暂无分享,去创建一个
Speech recognition for a vocabulary of 100000 words is described. Acoustic-segment networks are used as word templates in recognition. The acoustic-segment networks are automatically generated from orthographic strings of the words using rules that account for several kinds of variations in speech. To reduce the amount of computation in recognition, a tree representation of the networks and a preselection method based on input-frame sampling are used. It is confirmed that 98.75% of the computation can be eliminated without a significant increase of error, when using the preselection which outputs 500 candidates for main matching. Top-20 recognition accuracy is 93.5% for 10000 test utterances of five males and five females.<<ETX>>
[1] S. Kimura,et al. Extraction and evaluation of phonetic-acoustic rules for continuous speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[2] S. Kimura,et al. Interactive extraction of phonemic variation rules in continuous speech , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[3] S. Kimura,et al. Extraction of phonemic variation rules in continuous speech spoken by multiple speakers , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.