Probabilistic feature mapping based on trajectory HMMs

This paper proposes a feature mapping algorithm based on the trajectory GMM or trajectory HMM. Although the GMM or HMM-based feature mapping algorithm works effectively, its conversion quality sometimes degrades due to the inappropriate dynamic characteristics caused by the frame-by-frame conversion. While the use of dynamic features can alleviate this problem, it also introduces an inconsistency between training and mapping. The proposed algorithm can solve this inconsistency while keeping the benefits of the use of dynamic features, and offers an entire sequence-level transformation rather than the frame-by-frame conversion. Experimental results in voice conversion show that the proposed algorithm outperforms the conventional one both in objective and subjective tests.

[1]  Korin Richmond,et al.  Trajectory Mixture Density Networks with Multiple Mixtures for Acoustic-Articulatory Inversion , 2007, NOLISP.

[2]  Xiaodong Cui,et al.  MMSE-based stereo feature stochastic mapping for noise robust speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Masaaki Honda,et al.  Estimation of articulatory movements from speech acoustics using an HMM-based speech production model , 2004, IEEE Transactions on Speech and Audio Processing.

[4]  Le Zhang,et al.  Acoustic-Articulatory Modeling With the Trajectory HMM , 2008, IEEE Signal Processing Letters.

[5]  Heiga Zen,et al.  Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences , 2007, Comput. Speech Lang..

[6]  Satoshi Imai,et al.  Cepstral analysis synthesis on the mel frequency scale , 1983, ICASSP.

[7]  S. Renals,et al.  Acoustic-Articulatory Modelling with the Trajectory HMM , 2007 .

[8]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Yoshihiko Nankaku,et al.  Spectral conversion based on statistical models including time-sequence matching , 2007, SSW.

[11]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[12]  全 炳河,et al.  Reformulating HMM as a trajectory model by imposing explicit relationships between static and dynamic features , 2006 .

[13]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).