Fcl-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech Synthesis
暂无分享,去创建一个
Helen M. Meng | Yu Ting Yeung | Xunying Liu | Disong Wang | Nianzu Zheng | Liqun Deng | Yang Zhang | Xiao Chen | H. Meng | Xunying Liu | Disong Wang | Liqun Deng | Xiao Chen | Nianzu Zheng | Yang Zhang | Y. Yeung
[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[2] Boris Ginsburg,et al. TalkNet: Fully-Convolutional Non-Autoregressive Speech Synthesis Model , 2020, 2005.05514.
[3] Jan Skoglund,et al. A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet , 2019, INTERSPEECH.
[4] Heiga Zen,et al. Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices , 2016, INTERSPEECH.
[5] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[6] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[7] Ondrej Dusek,et al. SpeedySpeech: Efficient Neural Speech Synthesis , 2020, INTERSPEECH.
[8] Sungwon Kim,et al. Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search , 2020, NeurIPS.
[9] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[10] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[11] Tian Xia,et al. Aligntts: Efficient Feed-Forward Text-to-Speech System Without Explicit Alignment , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[13] Farinaz Koushanfar,et al. FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[14] Junmo Kim,et al. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Adrian La'ncucki. FastPitch: Parallel Text-to-speech with Pitch Prediction , 2020, ArXiv.
[16] Zhao Song,et al. Parallel Neural Text-to-Speech , 2019, ArXiv.
[17] Shujie Liu,et al. MoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search , 2020, INTERSPEECH.
[18] Morgan Sonderegger,et al. Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi , 2017, INTERSPEECH.
[19] Heiga Zen,et al. Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends , 2015, IEEE Signal Processing Magazine.
[20] Harry Shum,et al. From Eliza to XiaoIce: challenges and opportunities with social chatbots , 2018, Frontiers of Information Technology & Electronic Engineering.
[21] Kurt Keutzer,et al. SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis , 2020, ArXiv.
[22] Ran Zhang,et al. PPSpeech: Phrase based Parallel End-to-End TTS System , 2020, ArXiv.
[23] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[24] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[25] Shuang Liang,et al. Flow-TTS: A Non-Autoregressive Network for Text to Speech Based on Flow , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[27] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[28] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[31] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.