暂无分享,去创建一个
[1] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[2] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[3] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[5] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[6] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[7] Wei Chen,et al. Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech , 2020, ArXiv.
[8] Hiroaki Sakoe,et al. A Dynamic Programming Approach to Continuous Speech Recognition , 1971 .
[9] Sercan Ömer Arik,et al. Deep Voice 3: 2000-Speaker Neural Text-to-Speech , 2017, ICLR 2018.
[10] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[11] Lior Wolf,et al. VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop , 2017, ICLR.
[12] Sungwon Kim,et al. Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search , 2020, NeurIPS.
[13] Joseph P. Olive,et al. Text-to-speech synthesis , 1995, AT&T Technical Journal.
[14] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[15] Soroosh Mariooryad,et al. Location-Relative Attention Mechanisms for Robust Long-Form Speech Synthesis , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Jae Hyun Lim,et al. Geometric GAN , 2017, ArXiv.
[17] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[18] Shlomo Dubnov,et al. Expediting TTS Synthesis with Adversarial Vocoding , 2019, INTERSPEECH.
[19] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[20] Wei Ping,et al. Non-Autoregressive Neural Text-to-Speech , 2020, ICML.
[21] Takeru Miyato,et al. cGANs with Projection Discriminator , 2018, ICLR.
[22] Jan Skoglund,et al. LPCNET: Improving Neural Speech Synthesis through Linear Prediction , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[24] Dustin Tran,et al. Hierarchical Implicit Models and Likelihood-Free Variational Inference , 2017, NIPS.
[25] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[27] Jonathon Shlens,et al. A Learned Representation For Artistic Style , 2016, ICLR.
[28] Colin Raffel,et al. Online and Linear-Time Attention by Enforcing Monotonic Alignments , 2017, ICML.
[29] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[30] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[31] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[32] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[33] Eric Nalisnick,et al. Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..
[34] Eunwoo Song,et al. Probability density distillation with generative adversarial networks for high-quality parallel waveform generation , 2019, INTERSPEECH.
[35] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.
[36] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.
[37] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[38] Lei He,et al. Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS , 2019, INTERSPEECH.
[39] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[40] Chris Donahue,et al. Adversarial Audio Synthesis , 2018, ICLR.
[41] Zhao Song,et al. Parallel Neural Text-to-Speech , 2019, ArXiv.
[42] Mike Lewis,et al. MelNet: A Generative Model for Audio in the Frequency Domain , 2019, ArXiv.
[43] Colin Raffel,et al. Monotonic Chunkwise Attention , 2017, ICLR.
[44] Xin Wang,et al. Neural Source-filter-based Waveform Model for Statistical Parametric Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.
[46] Bryan Catanzaro,et al. Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis , 2021, ICLR.
[47] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[48] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[49] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[50] Erich Elsen,et al. High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.
[51] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[52] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[53] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[54] Li-Rong Dai,et al. Forward Attention in Sequence- To-Sequence Acoustic Modeling for Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[55] F. Itakura,et al. Minimum prediction residual principle applied to speech recognition , 1975 .
[56] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[57] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[58] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[59] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[60] Sungwon Kim,et al. FloWaveNet : A Generative Flow for Raw Audio , 2018, ICML.
[61] Lei Xie,et al. A New GAN-based End-to-End TTS Training Algorithm , 2019, INTERSPEECH.
[62] Xiaohua Zhai,et al. A Large-Scale Study on Regularization and Normalization in GANs , 2018, ICML.
[63] Sébastien Le Maguer,et al. How to compare TTS systems: a new subjective evaluation methodology focused on differences , 2015, INTERSPEECH.
[64] Chenjie Gu,et al. DDSP: Differentiable Digital Signal Processing , 2020, ICLR.
[65] Gregory Diamos,et al. Fast Spectrogram Inversion Using Multi-Head Convolutional Neural Networks , 2018, IEEE Signal Processing Letters.
[66] Kainan Peng,et al. WaveFlow: A Compact Flow-based Model for Raw Audio , 2020, ICML.
[67] Brian Roark,et al. Neural Models of Text Normalization for Speech Applications , 2019, Computational Linguistics.
[68] S. Chiba,et al. Dynamic programming algorithm optimization for spoken word recognition , 1978 .
[69] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[70] Hugo Larochelle,et al. Modulating early visual processing by language , 2017, NIPS.
[71] Marco Cuturi,et al. Soft-DTW: a Differentiable Loss Function for Time-Series , 2017, ICML.
[72] Hemant A. Patil,et al. Fusion of magnitude and phase-based features for objective evaluation of TTS voice , 2014, The 9th International Symposium on Chinese Spoken Language Processing.
[73] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[74] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[75] Shuang Liang,et al. Flow-TTS: A Non-Autoregressive Network for Text to Speech Based on Flow , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[76] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.