An autoencoder neural-network based low-dimensionality approach to excitation modeling for HMM-based text-to-speech
暂无分享,去创建一个
[1] Junichi Yamagishi,et al. Towards an improved modeling of the glottal source in statistical parametric speech synthesis , 2007, SSW.
[2] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[3] Takashi Saitoh,et al. An automatic pitch-marking method using wavelet transform , 2000, INTERSPEECH.
[4] Miguel Á. Carreira-Perpiñán,et al. On Contrastive Divergence Learning , 2005, AISTATS.
[5] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.
[6] Thierry Dutoit,et al. Using a pitch-synchronous residual codebook for hybrid HMM/frame selection speech synthesis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[7] Heiga Zen,et al. A trainable excitation model for HMM-based speech synthesis , 2007, INTERSPEECH.
[8] Heiga Zen,et al. Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems , 2009, INTERSPEECH.
[9] Keiichi Tokuda,et al. On the state definition for a trainable excitation model in HMM-based speech synthesis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.