Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
暂无分享,去创建一个
[1] Ramón Fernández Astudillo,et al. Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Yonghui Wu,et al. PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS , 2021, Interspeech.
[3] Hung-yi Lee,et al. Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Ryuichi Yamamoto,et al. TTS-by-TTS: TTS-Driven Data Augmentation for Fast and High-Quality Speech Synthesis , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Tie-Yan Liu,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2020, ICLR.
[6] Tie-Yan Liu,et al. LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition , 2020, KDD.
[7] Ming Li,et al. From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint , 2020, INTERSPEECH.
[8] Satoshi Nakamura,et al. Machine Speech Chain , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[9] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[10] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[11] Bhuvana Ramabhadran,et al. Speech Recognition with Augmented Synthesized Speech , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[12] Xu Tan,et al. Almost Unsupervised Text to Speech and Automatic Speech Recognition , 2019, ICML.
[13] Tomoharu Iwata,et al. Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[15] Heiga Zen,et al. LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech , 2019, INTERSPEECH.
[16] Jonathan Le Roux,et al. Cycle-consistency Training for End-to-end Speech Recognition , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[18] Yuxuan Wang,et al. Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[20] Koichi Shinoda,et al. Attentive Statistics Pooling for Deep Speaker Embedding , 2018, INTERSPEECH.
[21] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Tara N. Sainath,et al. An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Adam Coates,et al. Cold Fusion: Training Seq2Seq Models Together with Language Models , 2017, INTERSPEECH.
[24] Harshad Rai,et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .
[25] Morgan Sonderegger,et al. Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi , 2017, INTERSPEECH.
[26] Tie-Yan Liu,et al. Dual Learning for Machine Translation , 2016, NIPS.
[27] Masanori Morise,et al. Error Evaluation of an F0-Adaptive Spectral Envelope Estimator in Robustness against the Additive Noise and F0 Error , 2015, IEICE Trans. Inf. Syst..
[28] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[29] Masanori Morise,et al. CheapTrick, a spectral envelope estimator for high-quality speech synthesis , 2015, Speech Commun..
[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[31] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[32] P. Denes. The Speech Chain , 1963 .