Using phone durations in finnish large vocabulary continuous speech recognition

Finnish is one of the languages where phone durations discriminate between words and have in that way a significant role in the proper recognition of speech. Modern large vocabulary continuous speech recognizers do not offer reasonable means to model these durations, which would be necessary in order to seamlessly deal with such a language. Therefore some explicit actions have to be taken to be able to distinguish certain words from each other as the only cues for doing this might be prosodic ones, namely the durations. In this work, an extension of an existing speech recognition system to include models for discriminatively important phone durations is studied. The explicit duration model applied resulted in 5% relative reduction in the letter error rate of the recognition task.

[1]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for speech analysis , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Biing-Hwang Juang,et al.  Recent developments in the application of hidden Markov models to speaker-independent isolated word recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Mikko Kurimo,et al.  Unlimited vocabulary speech recognition based on morphs discovered in an unsupervised manner , 2003, INTERSPEECH.

[4]  Christoph Neukirchen,et al.  DUcoder-the Duisburg University LVCSR stackdecoder , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  T. Crystal,et al.  Segmental durations in connected‐speech signals: Current results , 1988 .

[6]  R. Moore,et al.  Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Antonio Bonafonte,et al.  An efficient algorithm to find the best state sequence in HSMM , 1993, EUROSPEECH.

[8]  A. Cook,et al.  Experimental evaluation of duration modelling techniques for automatic speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.