Direct Modeling of Frequency Spectra and Waveform Generation Based on Phase Recovery for DNN-Based Speech Synthesis
暂无分享,去创建一个
[1] Keiichi Tokuda,et al. CELP coding based on mel-cepstral analysis , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[2] Susan Fitt,et al. On generating combilex pronunciations via morphological analysis , 2010, INTERSPEECH.
[3] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[4] Bajibabu Bollepalli,et al. High-pitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Bhuvana Ramabhadran,et al. Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks , 2014, INTERSPEECH.
[6] Hirokazu Kameoka,et al. Generative Adversarial Network-Based Postfilter for STFT Spectrograms , 2017, INTERSPEECH.
[7] Zhizheng Wu,et al. Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning , 2015, INTERSPEECH.
[8] Junichi Yamagishi,et al. A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[10] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[11] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[12] Masanori Morise,et al. CheapTrick, a spectral envelope estimator for high-quality speech synthesis , 2015, Speech Commun..
[13] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[14] Dong Yu,et al. Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[15] Bhiksha Raj,et al. Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.
[16] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[17] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[18] S. King,et al. The Blizzard Challenge 2011 , 2011 .
[19] Hirokazu Kameoka,et al. Generative adversarial network-based postfilter for statistical parametric speech synthesis , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Frank K. Soong,et al. TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.