论文信息 - Deep neural network-based power spectrum reconstruction to improve quality of vocoded speech with limited acoustic parameters - 字舞流文

Deep neural network-based power spectrum reconstruction to improve quality of vocoded speech with limited acoustic parameters

Hisashi Kawai | Tomoki Toda | Yoshinori Shiga | Takuma Okamoto | Kentaro Tachibana | T. Toda | T. Okamoto | Kentaro Tachibana | H. Kawai | Y. Shiga

[1] Keiichi Tokuda,et al. An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3] Tomoki Toda,et al. Model Integration for HMM- and DNN-Based Speech Synthesis Using Product-of-Experts Framework , 2016, INTERSPEECH.

[4] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.

[5] Heiga Zen,et al. Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005 , 2007, IEICE Trans. Inf. Syst..

[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7] Patrick A. Naylor,et al. Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8] Tuomo Raitio,et al. A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .

[10] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[11] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..

[12] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13] Bajibabu Bollepalli,et al. GlottDNN - A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis , 2016, INTERSPEECH.