A recognition method with parametric trajectory synthesized using direct relations between static and dynamic feature vector time series

Parametric trajectory models have been proposed to exploit this time-dependency. However, parametric trajectory modeling methods are unable to take advantage of efficient HMM training and recognition methods. We have proposed a new speech recognition technique that generates a speech trajectory using an HMM-based speech synthesis method. This method generates an acoustic trajectory by maximizing the likelihood of the trajectory while taking into account the relation between the cepstrum, delta-cepstrum, and delta-delta cepstrum. In this paper, we extend our method to a general formulation including variance training procedure. Speaker independent speech recognition experiments show that the proposed method is effective for speech recognition.

[1]  S. Rocous,et al.  Stochastic segment modeling using the estimate-maximize algorithm , 1988 .

[2]  Keiichi Tokuda,et al.  An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features , 1995, EUROSPEECH.

[3]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[4]  K. Tokuda,et al.  Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Martin J. Russell,et al.  Probabilistic-trajectory segmental HMMs , 1999, Comput. Speech Lang..

[6]  Herbert Gish,et al.  A segmental speech model with applications to word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Herbert Gish,et al.  Hidden Markov models for trajectory modeling , 1998, ICSLP.

[8]  Herbert Gish,et al.  Parametric trajectory models for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Keiichi Tokuda,et al.  Spectral quantization using statistics of static and dynamic features , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[10]  Shigeru Katagiri,et al.  A RECOGNITION METHOD USING SYNTHESIS-BASED SCORING THAT INCORPORATES DIRECT RELATIONS BETWEEN STATIC AND DYNAMIC FEATURE VECTOR TIME SERIES , 2001 .