Phone duration models for fast broadcast news transcriptions

Phone duration modeling in HMM based LVCSR systems has been largely studied during last few years. In this paper, we address the problem of duration modeling in the particular context of fast decoding on LVCSR task. Discrete distributions are integrated in the LIA's broadcast news transcription system, and influence of duration modeling is studied using various pruning schemes. Experimental results show that duration modeling improve significantly the pruning efficiency. In a second time, we show that durations are intrinsecly acoustic-context dependent. Crossed experiments are conducted combining context independent acoustic models and context dependent duration models. We show how durations are affected by acoustic context. At last, we propose a rate dependent modeling of phone durations. This method outperforms significantly our rate independent duration model based system. Globally, integration of rate dependent models allows a absolute WER gain of 3.3%.

[1]  Richard M. Stern,et al.  Duration normalization for improved recognition of spontaneous and read speech via missing feature methods , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[3]  Renato De Mori,et al.  A family of parallel hidden Markov models , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Georges Linarès,et al.  Phoneme Lattice Based A* Search Algorithm for Speech Recognition , 2002, TSD.

[5]  Leah H. Jamieson,et al.  Modeling duration in a hidden Markov model with the exponential family , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.