Hierarchical Transfer Learning for Multilingual, Multi-Speaker, and Style Transfer DNN-Based TTS on Low-Resource Languages
暂无分享,去创建一个
[1] Hongfei Lin,et al. Low-Resource Cross-Domain Product Review Sentiment Classification Based on a CNN with an Auxiliary Large-Scale Corpus , 2017, Algorithms.
[2] Ausif Mahmood,et al. Review of Deep Learning Algorithms and Architectures , 2019, IEEE Access.
[3] Satoshi Nakamura,et al. Listening while speaking: Speech chain by deep learning , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[4] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[5] Supheakmungkol Sarin,et al. A Step-by-Step Process for Building TTS Voices Using Open Source Data and Frameworks for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese , 2018, SLTU.
[6] Kexin Feng,et al. Low-Resource Language Identification From Speech Using Transfer Learning , 2019, 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP).
[7] Ye-Yi Wang,et al. Is word error rate a good indicator for spoken language understanding accuracy , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).
[8] James H. Martin,et al. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .
[9] Satoshi Nakamura,et al. Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis , 2020, SLTU/CCURL@LREC.
[10] Zhiyong Wu,et al. A Review of Deep Learning Based Speech Synthesis , 2019, Applied Sciences.
[11] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[12] Yong Wu,et al. Convolution Neural Network based Transfer Learning for Classification of Flowers , 2018, 2018 IEEE 3rd International Conference on Signal and Image Processing (ICSIP).
[13] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[14] Dessi Puji Lestari,et al. A Large Vocabulary Continuous Speech Recognition System for Indonesian Language , 2006 .
[15] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[16] Yifan Liu,et al. Es-Tacotron2: Multi-Task Tacotron 2 with Pre-Trained Estimated Network for Reducing the Over-Smoothness Problem , 2019, Inf..
[17] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.
[18] Hung-yi Lee,et al. End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning , 2019, INTERSPEECH.
[19] Chris Yakopcic,et al. A State-of-the-Art Survey on Deep Learning Theory and Architectures , 2019, Electronics.
[20] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[21] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[22] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Mirna Adriani,et al. Hierarchical Transfer Learning for Text-to-Speech in Indonesian, Javanese, and Sundanese Languages , 2020, 2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS).
[24] Quan Wang,et al. Wavenet Based Low Rate Speech Coding , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Suryakanth V. Gangashetty,et al. Deep Elman recurrent neural networks for statistical parametric speech synthesis , 2017, Speech Commun..
[26] Takao Kobayashi,et al. Statistical Parametric Speech Synthesis Using Deep Gaussian Processes , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[27] Tatsuya Kawahara,et al. Transfer Learning of Language-independent End-to-end ASR with Language Model Fusion , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.
[29] Szu-Lin Wu,et al. Improving Unsupervised Style Transfer in end-to-end Speech Synthesis with end-to-end Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[30] Mauro Castelli,et al. Transfer Learning with Convolutional Neural Networks for Diabetic Retinopathy Image Classification. A Review , 2020, Applied Sciences.
[31] Xin Wang,et al. Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Jiangyan Yi,et al. Forward–Backward Decoding Sequence for Regularizing End-to-End TTS , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[33] Chao Yang,et al. A Survey on Deep Transfer Learning , 2018, ICANN.
[34] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Takao Kobayashi,et al. Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[36] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[37] Jianhua Tao,et al. Language-Adversarial Transfer Learning for Low-Resource Speech Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[38] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[39] Martine Grice,et al. The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using Semantically Unpredictable Sentences , 1996, Speech Commun..
[40] Wesley Mattheyses,et al. Audiovisual speech synthesis: An overview of the state-of-the-art , 2015, Speech Commun..
[41] Yuxuan Wang,et al. Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Heiga Zen,et al. Speech Synthesis Based on Hidden Markov Models , 2013, Proceedings of the IEEE.
[43] Tomohiro Nakatani,et al. A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments , 2008, Speech Commun..
[44] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[45] Joseph P. Olive,et al. Text-to-speech synthesis , 1995, AT&T Technical Journal.
[46] Aditya Khamparia,et al. A systematic review on deep learning architectures and applications , 2019, Expert Syst. J. Knowl. Eng..
[47] Abeer Alwan,et al. Reducing F0 Frame Error of F0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[48] Ryan Prenger,et al. Mellotron: Multispeaker Expressive Voice Synthesis by Conditioning on Rhythm, Pitch and Global Style Tokens , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[50] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[51] Yang Liu,et al. A Teacher-Student Framework for Zero-Resource Neural Machine Translation , 2017, ACL.
[52] Yating Yang,et al. Hierarchical Transfer Learning Architecture for Low-Resource Neural Machine Translation , 2019, IEEE Access.
[53] Chongchong Yu,et al. Cross-Language End-to-End Speech Recognition Research Based on Transfer Learning for the Low-Resource Tujia Language , 2019, Symmetry.
[54] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[55] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.