暂无分享,去创建一个
Yu Ting Yeung | Tan Lee | Daxin Tan | Liqun Deng | Xiao Chen | Xin Jiang | Tan Lee | Liqun Deng | Xiao Chen | Daxin Tan | Xin Jiang | Y. Yeung
[1] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..
[2] Jiangyan Yi,et al. Forward–Backward Decoding Sequence for Regularizing End-to-End TTS , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[3] Sercan Ömer Arik,et al. Deep Voice 3: 2000-Speaker Neural Text-to-Speech , 2017, ICLR 2018.
[4] Steve Whittaker,et al. Semantic speech editing , 2004, CHI.
[5] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Sungwon Kim,et al. Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search , 2020, NeurIPS.
[7] Meinard Müller,et al. Information retrieval for music and motion , 2007 .
[8] Alan Bundy,et al. Dynamic Time Warping , 1984 .
[9] A. Finkelstein,et al. HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks , 2020, INTERSPEECH.
[10] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[11] Zeyu Jin,et al. Context-Aware Prosody Correction for Text-Based Speech Editing , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Jianhua Tao,et al. Patnet : A Phoneme-Level Autoregressive Transformer Network for Speech Synthesis , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Jiajun Zhang,et al. Synchronous Bidirectional Neural Machine Translation , 2019, TACL.
[14] Mark D. Plumbley,et al. A Contextual Study of Semantic Speech Editing in Radio Production , 2018, Int. J. Hum. Comput. Stud..
[15] Chengzhu Yu,et al. DurIAN: Duration Informed Attention Network For Multimodal Synthesis , 2019, ArXiv.
[16] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[18] Gautham J. Mysore,et al. VoCo , 2017, ACM Trans. Graph..
[19] Morgan Sonderegger,et al. Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi , 2017, INTERSPEECH.
[20] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[21] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Wilmot Li,et al. Content-based tools for editing audio stories , 2013, UIST.
[24] Junichi Yamagishi,et al. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92) , 2019 .
[25] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[26] Hui Bu,et al. AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines , 2020, ArXiv.
[27] Jan Skoglund,et al. LPCNET: Improving Neural Speech Synthesis through Linear Prediction , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).