暂无分享,去创建一个
Heiga Zen | Mohammad Norouzi | Ron J. Weiss | Nanxin Chen | William Chan | Yu Zhang | Mohammad Norouzi | H. Zen | William Chan | Yu Zhang | Nanxin Chen
[1] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..
[2] Hong-Goo Kang,et al. ExcitNet Vocoder: A Neural Excitation Model for Parametric Speech Synthesis Systems , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).
[3] Mike Lewis,et al. MelNet: A Generative Model for Audio in the Frequency Domain , 2019, ArXiv.
[4] Zhen-Hua Ling,et al. Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders , 2020, INTERSPEECH.
[5] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Yang Song,et al. Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.
[7] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[8] Xi Chen,et al. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.
[9] Soroosh Mariooryad,et al. Location-Relative Attention Mechanisms for Robust Long-Form Speech Synthesis , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Erich Elsen,et al. High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.
[11] Noah Snavely,et al. Learning Gradient Fields for Shape Generation , 2020, ECCV.
[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[13] Omer Levy,et al. Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.
[14] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.
[15] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[16] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[17] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[18] Bernhard Schölkopf,et al. Deep Energy Estimator Networks , 2018, ArXiv.
[19] Xin Wang,et al. Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[20] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[21] Jakob Uszkoreit,et al. An Empirical Study of Generation Order for Machine Translation , 2019, EMNLP.
[22] Navdeep Jaitly,et al. Imputer: Sequence Modelling via Imputation and Dynamic Programming , 2020, ICML.
[23] Mohammad Norouzi,et al. Non-Autoregressive Machine Translation with Latent Alignments , 2020, EMNLP.
[24] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[25] Jakob Uszkoreit,et al. KERMIT: Generative Insertion-Based Modeling for Sequences , 2019, ArXiv.
[26] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Fadi Biadsy,et al. Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation , 2019, INTERSPEECH.
[28] William Chan,et al. Big Bidirectional Insertion Representations for Documents , 2019, NGT@EMNLP-IJCNLP.
[29] Bajibabu Bollepalli,et al. GlotNet—A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[30] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.
[31] Kumar Krishna Agrawal,et al. GANSynth: Adversarial Neural Audio Synthesis , 2019, ICLR.
[32] Chenjie Gu,et al. DDSP: Differentiable Digital Signal Processing , 2020, ICLR.
[33] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[34] Melvin Johnson,et al. Direct speech-to-speech translation with a sequence-to-sequence model , 2019, INTERSPEECH.
[35] Jasper Snoek,et al. A Spectral Energy Distance for Parallel Speech Synthesis , 2020, NeurIPS.
[36] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[37] Wei Ping,et al. Non-Autoregressive Neural Text-to-Speech , 2020, ICML.
[38] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Wei Chen,et al. Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech , 2020, ArXiv.
[40] Changhan Wang,et al. Levenshtein Transformer , 2019, NeurIPS.
[41] Youngik Kim,et al. VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network , 2020, INTERSPEECH.
[42] Jakob Uszkoreit,et al. Insertion Transformer: Flexible Sequence Generation via Insertion Operations , 2019, ICML.
[43] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[44] Zhen-Hua Ling,et al. WaveFFJORD: FFJORD-Based Vocoder for Statistical Parametric Speech Synthesis , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Wei Ping,et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.
[46] Sungwon Kim,et al. FloWaveNet : A Generative Flow for Raw Audio , 2018, ICML.
[47] Mitchell Stern,et al. Insertion-Deletion Transformer , 2020, ArXiv.
[48] Jan Skoglund,et al. LPCNET: Improving Neural Speech Synthesis through Linear Prediction , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[50] Taesung Park,et al. Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Zohaib Ahmed,et al. HooliGAN: Robust, High Quality Neural Vocoding , 2020, ArXiv.
[52] Jason Lee,et al. Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.
[53] Chris Donahue,et al. Adversarial Audio Synthesis , 2018, ICLR.
[54] Nam Soo Kim,et al. WaveNODE: A Continuous Normalizing Flow for Speech Synthesis , 2020, ArXiv.
[55] Chris Donahue,et al. Synthesizing Audio with Generative Adversarial Networks , 2018, ArXiv.
[56] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.
[57] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.
[58] Mohammad Norouzi,et al. Optimal Completion Distillation for Sequence Learning , 2018, ICLR.
[59] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[60] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[61] Abeer Alwan,et al. Reducing F0 Frame Error of F0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[62] Stefano Ermon,et al. Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.
[63] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.