A Modularized Neural Network with Language-Specific Output Layers for Cross-Lingual Voice Conversion
暂无分享,去创建一个
Haizhou Li | Yi Zhou | Emre Yilmaz | Xiaohai Tian | Rohan Kumar Das | Haizhou Li | Emre Yilmaz | Yi Zhou | Xiaohai Tian
[1] Hao Wang,et al. Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams , 2016, INTERSPEECH.
[2] H. Ney,et al. VTLN-based cross-language voice conversion , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).
[3] Kou Tanaka,et al. Cyclegan-VC2: Improved Cyclegan-based Non-parallel Voice Conversion , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Yu Zhang,et al. A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.
[5] Bin Ma,et al. Spoken Language Recognition: From Fundamentals to Practice , 2013, Proceedings of the IEEE.
[6] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[7] Xunying Liu,et al. Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance , 2018, INTERSPEECH.
[8] Yiming Wang,et al. Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.
[9] Lianhong Cai,et al. Learning cross-lingual knowledge with multilingual BLSTM for emphasis detection with limited training data , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Frank K. Soong,et al. Speaker and language factorization in DNN-based TTS synthesis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Tomoki Toda,et al. The Voice Conversion Challenge 2016 , 2016, INTERSPEECH.
[12] Zhenye Gan,et al. Mandarin-Tibetan Cross-Lingual Voice Conversion System Based on Deep Neural Network , 2018, CSAI '18.
[13] Keiichi Tokuda,et al. Incorporating a mixed excitation model and postfilter into HMM-based text-to-speech synthesis , 2005 .
[14] Antonio Bonafonte,et al. Voice conversion using k-histograms and frame selection , 2009, INTERSPEECH.
[15] Haizhou Li,et al. Sparse representation of phonetic features for voice conversion with and without parallel data , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[16] Heiga Zen,et al. Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[17] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[18] Haifeng Li,et al. A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences , 2016, INTERSPEECH.
[19] Shinnosuke Takamichi,et al. Non-Parallel Voice Conversion Using Variational Autoencoders Conditioned by Phonetic Posteriorgrams and D-Vectors , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Yi Zhou,et al. Many-to-many Cross-lingual Voice Conversion with a Jointly Trained Speaker Embedding Network , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[21] Masanobu Abe,et al. Cross-language voice conversion , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[22] Daniel Erro,et al. Frame alignment method for cross-lingual voice conversion , 2007, INTERSPEECH.
[23] T. Nagarajan,et al. A Multi-level GMM-Based Cross-Lingual Voice Conversion Using Language-Specific Mixture Weights for Polyglot Synthesis , 2015, Circuits, Systems, and Signal Processing.
[24] Tomoki Toda,et al. Cross-language voice conversion based on eigenvoices , 2009, INTERSPEECH.
[25] Yu Tsao,et al. Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks , 2017, INTERSPEECH.
[26] Daniel Erro,et al. Voice Conversion of Non-aligned Data using Unit Selection , 2006 .
[27] Rich Caruana,et al. Multitask Learning , 1997, Machine Learning.
[28] Heiga Zen,et al. Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis , 2016, INTERSPEECH.
[29] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..
[30] Hao Wang,et al. Phonetic posteriorgrams for many-to-one voice conversion without parallel data training , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).
[31] Frank K. Soong,et al. A frame mapping based HMM approach to cross-lingual voice transformation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Haizhou Li,et al. Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet , 2019, INTERSPEECH.
[33] Zhizheng Wu,et al. Merlin: An Open Source Neural Network Speech Synthesis System , 2016, SSW.
[34] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[35] Yi Zhou,et al. Speaker-independent Spectral Mapping for Speech-to-Singing Conversion , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[36] Hermann Ney,et al. Text-Independent Voice Conversion Based on Unit Selection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[37] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[38] Chng Eng Siong,et al. A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data , 2019, INTERSPEECH.
[39] Sebastian Ruder,et al. An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.
[40] Haizhou Li,et al. A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder , 2018, INTERSPEECH.
[41] Haizhou Li,et al. Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[42] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[43] Mirjam Wester,et al. The EMIME Mandarin Bilingual Database , 2011 .
[44] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[45] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.
[46] Haizhou Li,et al. Average Modeling Approach to Voice Conversion with Non-Parallel Data , 2018, Odyssey.
[47] Kishore Prahallad,et al. A Framework for Cross-Lingual Voice Conversion using Articial Neural Networks , 2009 .
[48] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[49] K. Shikano,et al. Statistical analysis of bilingual speaker's speech for cross-language voice conversion. , 1991, The Journal of the Acoustical Society of America.
[50] Hui Bu,et al. AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale , 2018, ArXiv.
[51] Haizhou Li,et al. Cross-lingual Voice Conversion with Bilingual Phonetic Posteriorgram and Average Modeling , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[52] Haizhou Li,et al. Group Sparse Representation With WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[53] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Daniel Erro,et al. INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[55] Keiichi Tokuda,et al. Incorporating a mixed excitation model and postfilter into HMM-based text-to-speech synthesis , 2005, Systems and Computers in Japan.