Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis
暂无分享,去创建一个
[1] Haifeng Li,et al. A KL divergence and DNN approach to cross-lingual TTS , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Ming Zhou,et al. Close to Human Quality TTS with Transformer , 2018, ArXiv.
[3] Frank K. Soong,et al. An HMM-based bilingual (Mandarin-English) TTS , 2007, SSW.
[4] Frank K. Soong,et al. HMM-Based Mixed-Language (Mandarin-English) Speech Synthesis , 2008, 2008 6th International Symposium on Chinese Spoken Language Processing.
[5] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[6] Li-Rong Dai,et al. Forward Attention in Sequence- To-Sequence Acoustic Modeling for Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Beat Pfister,et al. From multilingual to polyglot speech synthesis , 1999, EUROSPEECH.
[8] Frank K. Soong,et al. A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin–English) TTS , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[9] Heiga Zen,et al. Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[10] Frank K. Soong,et al. Speaker and language factorization in DNN-based TTS synthesis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Jan Skoglund,et al. LPCNET: Improving Neural Speech Synthesis through Linear Prediction , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Wei Song,et al. Building a mixed-lingual neural TTS system with only monolingual data , 2019, INTERSPEECH.
[13] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[14] Elizabeth A. Strickland,et al. An Introduction to the Psychology of Hearing (6th edition) , 2014 .
[15] Alan W. Black,et al. Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text , 2016, SSW.
[16] Tao Wang,et al. Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Alan W. Black,et al. Speech Synthesis of Code-Mixed Text , 2016, LREC.
[18] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[20] Alan W. Black,et al. Speech Synthesis for Mixed-Language Navigation Instructions , 2017, INTERSPEECH.
[21] Frank K. Soong,et al. Turning a Monolingual Speaker into Multilingual for a Mixed-language TTS , 2012, INTERSPEECH.
[22] Sadaoki Furui,et al. Polyglot synthesis using a mixture of monolingual corpora , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[23] Tara N. Sainath,et al. Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[25] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[26] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[27] T. Nagarajan,et al. Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages , 2013, 2013 IEEE International Conference of IEEE Region 10 (TENCON 2013).
[28] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[29] E. Owens. Introduction to the Psychology of Hearing , 1977 .
[30] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[31] Heiga Zen,et al. Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis , 2016, INTERSPEECH.