Explicit Intensity Control for Accented Text-to-speech
暂无分享,去创建一个
Haizhou Li | Guanglai Gao | Rui Liu | Haolin Zuo | De Hu
[1] T. Shinozaki,et al. Self-Supervised Learning with Multi-Target Contrastive Coding for Non-Native Acoustic Modeling of Mispronunciation Verification , 2022, INTERSPEECH.
[2] Björn Schuller,et al. Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning , 2022, INTERSPEECH.
[3] Rubén Pérez Ramón,et al. Foreign accent strength and intelligibility at the segmental level , 2022, Speech Commun..
[4] Brian Kan-Wing Mak,et al. Multi-Lingual Multi-Speaker Text-to-Speech Synthesis for Voice Cloning with Online Speaker Enrollment , 2020, INTERSPEECH.
[5] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[6] Tie-Yan Liu,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2020, ICLR.
[7] Heiga Zen,et al. Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning , 2019, INTERSPEECH.
[8] Ricardo Gutierrez-Osuna,et al. L2-ARCTIC: A Non-native English Speech Corpus , 2018, INTERSPEECH.
[9] Florin Curelaru,et al. Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).
[10] Junichi Yamagishi,et al. Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[13] Samy Bengio,et al. Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model , 2017, ArXiv.
[14] Sangramsing N. Kayte,et al. Speech Synthesis System for Marathi Accent using FESTVOX , 2015 .
[15] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..
[16] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Yong Wang,et al. Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers , 2015, Speech Commun..
[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[19] Simon King,et al. The voice bank corpus: Design, collection and data analysis of a large regional accent speech database , 2013, 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).
[20] Alan W. Black,et al. Accent Group modeling for improved prosody in statistical parameteric speech synthesis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[21] Chai Wutiwiwatchai,et al. Accent level adjustment in bilingual Thai-English text-to-speech synthesis , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[22] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[23] Kristin Precoda,et al. EduSpeak®: A speech recognition and pronunciation scoring toolkit for computer-aided language learning applications , 2010 .
[24] Junichi Yamagishi,et al. Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis , 2010, Speech Commun..
[25] Tracey M. Derwing,et al. THE MUTUAL INTELLIGIBILITY OF L2 SPEECH , 2006, Studies in Second Language Acquisition.
[26] Steve J. Young,et al. Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..
[27] Alex Waibel,et al. Consonant recognition by modular construction of large phonemic time-delay neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[28] T. Shinozaki,et al. Self-Supervised Learning with Multi-Target Contrastive Coding for Non-Native Acoustic Modeling of Mispronunciation Verification , 2022 .
[29] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.
[30] Junichi Yamagishi,et al. Generating segmental foreign accent , 2014, INTERSPEECH.
[31] Thomas Niesler,et al. Automatic conversion between pronunciations of different English accents , 2011, Speech Commun..