Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis
暂无分享,去创建一个
[1] Keiichi Tokuda,et al. Speaker adaptation for HMM-based speech synthesis system using MLLR , 1998, SSW.
[2] Mark J. F. Gales. Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..
[3] Yifan Gong,et al. Investigating online low-footprint speaker adaptation using generalized linear regression and click-through data , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Keiichi Tokuda,et al. Eigenvoices for HMM-based speech synthesis , 2002, INTERSPEECH.
[5] BART KOSKO,et al. Bidirectional associative memories , 1988, IEEE Trans. Syst. Man Cybern..
[6] Frank K. Soong,et al. TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.
[7] Kaisheng Yao,et al. KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[8] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[9] Zhizheng Wu,et al. A study of speaker adaptation for DNN-based speech synthesis , 2015, INTERSPEECH.
[10] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[11] Keiichi Tokuda,et al. Incorporating a mixed excitation model and postfilter into HMM-based text-to-speech synthesis , 2005 .
[12] Keiichi Tokuda,et al. Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[13] Bhuvana Ramabhadran,et al. An autoencoder neural-network based low-dimensionality approach to excitation modeling for HMM-based text-to-speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[14] Mark J. F. Gales,et al. Multi-basis adaptive neural network for rapid adaptation in speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[16] Tuomo Raitio,et al. DNN-based stochastic postfilter for HMM-based speech synthesis , 2014, INTERSPEECH.
[17] Steve Renals,et al. Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[18] Junichi Yamagishi,et al. Multiple feed-forward deep neural networks for statistical parametric speech synthesis , 2015, INTERSPEECH.
[19] Susan Fitt,et al. On generating combilex pronunciations via morphological analysis , 2010, INTERSPEECH.
[20] Keiichi Tokuda,et al. A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..
[21] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..
[22] Dong Yu,et al. Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[23] Yifan Gong,et al. Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Philip N. Garner,et al. Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[25] G. Kane. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .
[26] Bhuvana Ramabhadran,et al. Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks , 2014, INTERSPEECH.
[27] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .
[28] Takao Kobayashi,et al. Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm , 2009, IEEE Transactions on Audio, Speech, and Language Processing.