The demiphone: an efficient subword unit for continuous speech recognition

In this paper we introduce the demiphone as a contextual phonetic unit for continuous speech recognition. A phone is divided into two parts: a left demiphone that accounts for the left side coarticulation and a right demiphone that copes with the right side context. This new unit discards the dependence between the effects of both side contexts, but provides a better training of the transition between phones. The demiphone can be seen as a heuristic clustering of states that allows a more smoothed training of hidden Markov models and additionally supplies a simple way to create unseen triphones. We report experimental evidence that demiphones outperform the usual combination of triphones, right-side and left-side biphones and monophones.

[1]  José B. Mariño,et al.  Language modeling using x-grams , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Li Deng,et al.  Modeling context-dependent phonetic units in a continuous speech recognition system for Mandarin Chinese , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  D.J.B. Pearce,et al.  Improved vocabulary-independent sub-word HMM modelling , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Luis A. Hernández Gómez,et al.  Context-dependent units for vocabulary-independent Spanish speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  José B. Mariño,et al.  TELEMACO - a real time keyword spotting application for voice dialling , 1993, EUROSPEECH.

[6]  Antonio Bonafonte,et al.  Study of subword units for Spanish speech recognition , 1995, EUROSPEECH.

[7]  Biing-Hwang Juang,et al.  A study on task-independent subword selection and modeling for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.