Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network
暂无分享,去创建一个
Tomoki Toda | Yi-Chiao Wu | Tomoki Hayashi | Kazuhiro Kobayashi | Patrick Lumban Tobing | T. Toda | Kazuhiro Kobayashi | Tomoki Hayashi | Yi-Chiao Wu
[1] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Phil Clendeninn. The Vocoder , 1940, Nature.
[3] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[4] Bajibabu Bollepalli,et al. Speaker-independent raw waveform model for glottal excitation , 2018, INTERSPEECH.
[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[6] Tomoki Toda,et al. The NU Non-Parallel Voice Conversion System for the Voice Conversion Challenge 2018 , 2018, Odyssey.
[7] Xin Wang,et al. Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[8] Vassilis Tsiaras,et al. ON the Use of Wavenet as a Statistical Vocoder , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Logan Volkers,et al. PHASE VOCODER , 2008 .
[10] M. Mathews,et al. Pitch Synchronous Analysis of Voiced Sounds , 1961 .
[11] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[12] Max Welling,et al. Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.
[13] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.
[14] Karen Simonyan,et al. The challenge of realistic music generation: modelling raw audio at scale , 2018, NeurIPS.
[15] Adam Finkelstein,et al. Fftnet: A Real-Time Speaker-Dependent Neural Vocoder , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Yoshihiko Nankaku,et al. Deep neural network based real-time speech vocoder with periodic and aperiodic inputs , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).
[17] Xavier Serra,et al. A Wavenet for Speech Denoising , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Sercan Ömer Arik,et al. Deep Voice 3: 2000-Speaker Neural Text-to-Speech , 2017, ICLR 2018.
[19] Tomoki Toda,et al. NU Voice Conversion System for the Voice Conversion Challenge 2018 , 2018, Odyssey.
[20] Tomoki Toda,et al. An investigation of multi-speaker training for wavenet vocoder , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[21] Tomoki Toda,et al. Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression , 2020, IEEE Access.
[22] C.E. Shannon,et al. Communication in the Presence of Noise , 1949, Proceedings of the IRE.
[23] Quan Wang,et al. Wavenet Based Low Rate Speech Coding , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[25] Masanori Morise,et al. CheapTrick, a spectral envelope estimator for high-quality speech synthesis , 2015, Speech Commun..
[26] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.
[27] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[28] Zhen-Hua Ling,et al. Samplernn-Based Neural Vocoder for Statistical Parametric Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[30] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Tomoki Toda,et al. An Investigation of Noise Shaping with Perceptual Weighting for Wavenet-Based Speech Generation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Tomoki Toda,et al. Collapsed speech segment detection and suppression for WaveNet vocoder , 2018, INTERSPEECH.
[33] Tomoki Toda,et al. Statistical Voice Conversion with WaveNet-Based Waveform Generation , 2017, INTERSPEECH.
[34] Tomoki Toda,et al. Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation , 2019, INTERSPEECH.
[35] Tomoki Toda,et al. sprocket: Open-Source Voice Conversion Software , 2018, Odyssey.
[36] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[37] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[38] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[39] Tomoki Toda,et al. Statistical Voice Conversion with Quasi-Periodic WaveNet Vocoder , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).
[40] Manfred R. Schroeder,et al. Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[41] Junichi Yamagishi,et al. The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods , 2018, Odyssey.
[42] Bishnu S. Atal,et al. Improving performance of multi-pulse LPC coders at low bit rates , 1984, ICASSP.
[43] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[44] Mark Hasegawa-Johnson,et al. Speech Enhancement Using Bayesian Wavenet , 2017, INTERSPEECH.
[45] W. Marsden. I and J , 2012 .
[46] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.
[47] Thomas F. Quatieri,et al. Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..
[48] Manfred R. Schroeder,et al. Vocoders: Analysis and synthesis of speech , 1966 .
[49] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[50] Jan Skoglund,et al. LPCNET: Improving Neural Speech Synthesis through Linear Prediction , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[51] Xi Wang,et al. A New Glottal Neural Vocoder for Speech Synthesis , 2018, INTERSPEECH.
[52] Yoshua Bengio,et al. NICE: Non-linear Independent Components Estimation , 2014, ICLR.
[53] Xin Wang,et al. Neural Source-filter-based Waveform Model for Statistical Parametric Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Amro El-Jaroudi,et al. Discrete all-pole modeling , 1991, IEEE Trans. Signal Process..
[55] Sungwon Kim,et al. FloWaveNet : A Generative Flow for Raw Audio , 2018, ICML.
[56] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[57] Biing-Hwang Juang,et al. An 800 bit/s vector quantization LPC vocoder , 1982 .
[58] L. H. Anauer,et al. Speech Analysis and Synthesis by Linear Prediction of the Speech Wave , 2000 .