100000-word recognition using acoustic-segment networks

Speech recognition for a vocabulary of 100000 words is described. Acoustic-segment networks are used as word templates in recognition. The acoustic-segment networks are automatically generated from orthographic strings of the words using rules that account for several kinds of variations in speech. To reduce the amount of computation in recognition, a tree representation of the networks and a preselection method based on input-frame sampling are used. It is confirmed that 98.75% of the computation can be eliminated without a significant increase of error, when using the preselection which outputs 500 candidates for main matching. Top-20 recognition accuracy is 93.5% for 10000 test utterances of five males and five females.<<ETX>>

[1]  S. Kimura,et al.  Extraction and evaluation of phonetic-acoustic rules for continuous speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[2]  S. Kimura,et al.  Interactive extraction of phonemic variation rules in continuous speech , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  S. Kimura,et al.  Extraction of phonemic variation rules in continuous speech spoken by multiple speakers , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.