New-word addition and adaptation in a stochastic explicit-segment speech recognition system

The authors extend on automatic procedure for the addition of new words to a speech recognition system to include alternative pronunciations for the new words. They investigate methods for adaptation to new words after these are added to the system. For adaptation, the goal was the improvement of the accuracy of the system on the new words, using only a limited amount of speech data. All the experiments are performed within the stochastic explicit-segment speech recognition system. The authors evaluated 25 isolated city names from a speech corpus, CITRON, collected from real users over the telephone network. For this task, improvement in accuracy is shown from a 34% error rate, when trained on the NTIMIT database alone, to 8% after adapting to 30 tokens, on average, from each new word.<<ETX>>

[1]  John Makhoul,et al.  Automatic Detection Of New Words In A Large Vocabulary Continuous Speech Recognition System , 1989, HLT.

[2]  Benjamin Chigier,et al.  Phonetic Classification on Wide-Band and Telephone Quality Speech , 1992, HLT.

[3]  Victor Zue,et al.  Speech recognition using stochastic explicit-segment modeling , 1991, EUROSPEECH.

[4]  James R. Glass,et al.  A comparative study of signal representations and classification techniques for speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  J. Makhoul,et al.  Automatic modeling for adding new words to a large-vocabulary continuous speech recognition system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Herbert Gish,et al.  Stochastic segment modelling using the estimate-maximize algorithm (speech recognition) , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[7]  Victor W. Zue,et al.  Phonetic classification using multi-layer perceptrons , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  A. Asadi,et al.  Automatic detection and modeling of new words in a large-vocabulary continuous speech recognition system , 1992 .

[9]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[10]  James Glass,et al.  The SUMMIT speech recognition system: phonological modelling and lexical access , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[11]  Sara H. Basson,et al.  NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[12]  Hong C. Leung,et al.  Speech recognition using stochastic segment neural networks , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Judith Spitz Collection and Analysis of Data from Real Users: Implications for Speech Recognition/Understanding Systems , 1991, HLT.