Improving FFTNet Vocoder with Noise Shaping and Subband Approaches
暂无分享,去创建一个
Tomoki Toda | Hisashi Kawai | Takuma Okamoto | Yoshinori Shiga | T. Toda | T. Okamoto | H. Kawai | Y. Shiga
[1] Haizhou Li,et al. A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder , 2018, INTERSPEECH.
[2] Thomas S. Huang,et al. Fast Generation for Convolutional Autoregressive Models , 2017, ICLR.
[3] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[4] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[5] Jordi Bonada,et al. A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs , 2017 .
[6] Yoshihiko Nankaku,et al. Statistical Voice Conversion Based on Wavenet , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Zhen-Hua Ling,et al. Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension , 2017, INTERSPEECH.
[8] Bajibabu Bollepalli,et al. Speaker-independent raw waveform model for glottal excitation , 2018, INTERSPEECH.
[9] Heiga Zen,et al. Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends , 2015, IEEE Signal Processing Magazine.
[10] Jindrich Matousek,et al. On the Analysis of Training Data for Wavenet-Based Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Bo Chen,et al. High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder , 2018, INTERSPEECH.
[12] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[13] Tomoki Toda,et al. Collapsed speech segment detection and suppression for WaveNet vocoder , 2018, INTERSPEECH.
[14] Hisashi Kawai,et al. Deep neural network-based power spectrum reconstruction to improve quality of vocoded speech with limited acoustic parameters , 2018 .
[15] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[16] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[17] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Quan Wang,et al. Wavenet Based Low Rate Speech Coding , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Tomoki Toda,et al. Statistical Voice Conversion with WaveNet-Based Waveform Generation , 2017, INTERSPEECH.
[20] Tomoki Toda,et al. An investigation of multi-speaker training for wavenet vocoder , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[21] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[22] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[23] Hirokazu Kameoka,et al. Direct Modeling of Frequency Spectra and Waveform Generation Based on Phase Recovery for DNN-Based Speech Synthesis , 2017, INTERSPEECH.
[24] Bishnu S. Atal,et al. Predictive coding of speech signals and subjective error criteria , 1978, ICASSP.
[25] Ranniery Maia,et al. Postprocessing Synthetic Speech With a Complex Cepstrum Vocoder for Spoofing Phase-Based Synthetic Speech Detectors , 2017, IEEE Journal of Selected Topics in Signal Processing.
[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[27] Lauri Juvela,et al. A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Tomoki Toda,et al. An Investigation of Subband Wavenet Vocoder Covering Entire Audible Frequency Range with Limited Acoustic Features , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Junichi Yamagishi,et al. A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Yu Gu,et al. Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions , 2018, INTERSPEECH.
[31] Sercan Ömer Arik,et al. Deep Voice 3: 2000-Speaker Neural Text-to-Speech , 2017, ICLR 2018.
[32] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[33] Seyed Hamidreza Mohammadi,et al. An overview of voice conversion systems , 2017, Speech Commun..
[34] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[35] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .
[36] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[37] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[38] Bajibabu Bollepalli,et al. GlottDNN - A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis , 2016, INTERSPEECH.
[39] Xi Chen,et al. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.
[40] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[41] Hideki Kawahara,et al. Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT , 2005, INTERSPEECH.
[42] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.
[43] Heiga Zen,et al. Speech Synthesis Based on Hidden Markov Models , 2013, Proceedings of the IEEE.
[44] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[45] Bajibabu Bollepalli,et al. A Comparison Between STRAIGHT, Glottal, and Sinusoidal Vocoding in Statistical Parametric Speech Synthesis , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[46] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[47] Heiga Zen,et al. Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005 , 2007, IEICE Trans. Inf. Syst..
[48] Vassilis Tsiaras,et al. ON the Use of Wavenet as a Statistical Vocoder , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Keiichi Tokuda,et al. An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[50] Yoshihiko Nankaku,et al. Mel-Cepstrum-Based Quantization Noise Shaping Applied to Neural-Network-Based Speech Waveform Synthesis , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[51] Tomoki Toda,et al. Subband wavenet with overlapped single-sideband filterbanks , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[52] Keiichi Tokuda,et al. Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.
[53] Adam Finkelstein,et al. Fftnet: A Real-Time Speaker-Dependent Neural Vocoder , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Yannis Agiomyrgiannakis,et al. Vocaine the vocoder and applications in speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[55] Inma Hernáez,et al. Harmonics Plus Noise Model Based Vocoder for Statistical Parametric Speech Synthesis , 2014, IEEE Journal of Selected Topics in Signal Processing.
[56] Jordi Bonada,et al. A Neural Parametric Singing Synthesizer , 2017, INTERSPEECH.
[57] Tomoki Toda,et al. An Investigation of Noise Shaping with Perceptual Weighting for Wavenet-Based Speech Generation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[58] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[59] Zhen-Hua Ling,et al. Samplernn-Based Neural Vocoder for Statistical Parametric Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).