A stochastic model of speech incorporating hierarchical nonstationarity

The concept of two-level (global and local) hierarchical nonstationarity is introduced to describe the elastic and dynamic nature of the speech signal. A doubly stochastic process model is developed to implement this concept. In the model, the global nonstationarity is embodied through an underlying Markov chain that governs evolution of the parameters in a set of output stochastic processes. The local nonstationarity is realized by utilizing state-conditioned, time-varying first- and second-order statistics in the output data-generation process models. For potential uses in automatic uncovering of relationally invariant properties from the speech signal and in speech recognition, the local nonstationarity is represented in a parametric form. Preliminary experiments on fitting the models to speech data demonstrate superior performances of the proposed model to several traditional types of hidden Markov models. >

[1]  Hamid Sheikhzadeh,et al.  Waveform-based speech recognition using hidden filter models: parameter selection and sensitivity to power normalization , 1994, IEEE Trans. Speech Audio Process..

[2]  Li Deng,et al.  A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal , 1992, Signal Process..

[3]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[4]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[5]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  Li Deng,et al.  Large vocabulary word recognition using context-dependent allophonic hidden Markov models☆ , 1990 .

[7]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Patrick Kenny,et al.  A linear predictive HMM for vector-valued observations with applications to speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[10]  Vishwa Gupta,et al.  Integration of acoustic information in a large vocabulary word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[12]  Bart Kosko,et al.  Neural networks for signal processing , 1992 .

[13]  Li Deng,et al.  Microstructural speech units and their HMM representation for discrete utterance speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[14]  Li Deng,et al.  Neural-network architecture for linear and nonlinear predictive hidden Markov models: application to speech recognition , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[15]  Ken-ichi Iso,et al.  Speaker-independent word recognition using a neural prediction model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[16]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[17]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[18]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .