Modeling speech parameter sequences with latent trajectory Hidden Markov model

This paper proposes a probabilistic generative model of a sequence of vectors called the latent trajectory hidden Markov model (HMM). While a conventional HMM isonly capable of describing piecewise stationary sequences of data vectors, the proposed model is capable of describing continuously time-varying sequences of data vectors, governed by discrete hidden states. This feature is noteworthy in that it can be used to model many kinds of time series data that are continuous in nature such as speech spectra. Given a sequence of observed data, the optimal state sequence can be decoded using the expectation-maximization (EM) algorithm. Given a set of training examples, the underlying model parameters can be trained by either the expectation-maximization algorithm or the variational inference algorithm.

[1]  Hirokazu Kameoka,et al.  A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models , 2014, INTERSPEECH.

[2]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Le Zhang,et al.  Acoustic-Articulatory Modeling With the Trajectory HMM , 2008, IEEE Signal Processing Letters.

[4]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[5]  Hirokazu Kameoka,et al.  Joint audio source separation and dereverberation based on multichannel factorial hidden Markov model , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[6]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  Hirokazu Kameoka,et al.  Bayesian nonparametric spectrogram modeling based on infinite factorial infinite hidden Markov model , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[8]  Ren-Hua Wang,et al.  Minimum Generation Error Training for HMM-Based Speech Synthesis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Keiichi Tokuda,et al.  Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model , 2008, Speech Commun..

[10]  Heiga Zen,et al.  Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences , 2007, Comput. Speech Lang..

[11]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[12]  S. Renals,et al.  Acoustic-Articulatory Modelling with the Trajectory HMM , 2007 .

[13]  Hirokazu Kameoka,et al.  Unified approach for underdetermined BSS, VAD, dereverberation and DOA estimation with multichannel factorial HMM , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).