暂无分享,去创建一个
Hirokazu Kameoka | Tomoki Toda | Yi-Chiao Wu | Tomoki Hayashi | Wen-Chin Huang | H. Kameoka | T. Toda | Tomoki Hayashi | Yi-Chiao Wu | Wen-Chin Huang
[1] Simon King,et al. An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[2] Junichi Yamagishi,et al. NAUTILUS: A Versatile Voice Cloning System , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[3] Kou Tanaka,et al. Many-to-Many Voice Transformer Network , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[4] Babak Naderi,et al. An Open source Implementation of ITU-T Recommendation P.808 with Validation , 2020, INTERSPEECH.
[5] Seung-won Park,et al. Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data , 2020, INTERSPEECH.
[6] James R. Glass,et al. Improved Speech Representations with Multi-Target Autoregressive Predictive Coding , 2020, ACL.
[7] Songxiang Liu,et al. Multi-Target Emotional Voice Conversion With Neural Vocoders , 2020, 2004.03782.
[8] Hirokazu Kameoka,et al. Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining , 2019, INTERSPEECH.
[9] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] K. Takeda,et al. Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] James R. Glass,et al. Generative Pre-Training for Speech with Autoregressive Predictive Coding , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Michael Auli,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[13] Ricardo Gutierrez-Osuna,et al. Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams , 2019, INTERSPEECH.
[14] Chng Eng Siong,et al. A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data , 2019, INTERSPEECH.
[15] Junichi Yamagishi,et al. Bootstrapping Non-Parallel Voice Conversion from Speaker-Adaptive Text-to-Speech , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[16] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[17] Praveen Narayanan,et al. Hierarchical Sequence to Sequence Voice Conversion with Limited Data , 2019, ArXiv.
[18] Li-Rong Dai,et al. Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[19] Haizhou Li,et al. VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019 , 2019, INTERSPEECH.
[20] Haizhou Li,et al. Cross-lingual Voice Conversion with Bilingual Phonetic Posteriorgram and Average Modeling , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[22] Fadi Biadsy,et al. Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation , 2019, INTERSPEECH.
[23] Raja Giryes,et al. Taco-VC: A Single Speaker Tacotron based Voice Conversion with Limited Data , 2019, 2020 28th European Signal Processing Conference (EUSIPCO).
[24] Yoshua Bengio,et al. Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks , 2019, INTERSPEECH.
[25] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[26] Haizhou Li,et al. Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet , 2019, INTERSPEECH.
[27] Ron J. Weiss,et al. Unsupervised Speech Representation Learning Using WaveNet Autoencoders , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[28] Yoshua Bengio,et al. Learning Speaker Representations with Mutual Information , 2018, INTERSPEECH.
[29] Li-Rong Dai,et al. Improving Sequence-to-sequence Voice Conversion by Adding Text-supervision , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Kou Tanaka,et al. ATTS2S-VC: Sequence-to-sequence Voice Conversion with Attention and Context Preservation Mechanisms , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Hirokazu Kameoka,et al. ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion , 2018, ArXiv.
[32] Li-Rong Dai,et al. Sequence-to-Sequence Acoustic Modeling for Voice Conversion , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[33] Haizhou Li,et al. Error Reduction Network for DBLSTM-based Voice Conversion , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[34] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[35] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[36] Shuang Xu,et al. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Junichi Yamagishi,et al. The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods , 2018, Odyssey.
[38] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[39] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[40] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[41] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Tomoki Toda,et al. An investigation of multi-speaker training for wavenet vocoder , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[43] Hideyuki Tachibana,et al. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[44] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[45] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[46] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[47] H. Saruwatari,et al. Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities , 2017, INTERSPEECH.
[48] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[49] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[50] Tomoki Toda,et al. The Voice Conversion Challenge 2016 , 2016, INTERSPEECH.
[51] Zhizheng Wu,et al. On the use of I-vectors and average voice model for voice conversion without parallel data , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[52] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[53] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[54] Hao Wang,et al. Phonetic posteriorgrams for many-to-one voice conversion without parallel data training , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).
[55] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[56] Leon A. Gatys,et al. Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.
[58] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[59] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[60] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[61] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[62] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[63] Trevor Darrell,et al. Fully convolutional networks for semantic segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[65] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[66] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[67] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[68] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[69] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.
[70] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[71] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[72] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[73] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..
[74] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[75] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[76] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[77] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[78] Tara N. Sainath,et al. The shared views of four research groups ) , 2012 .
[79] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[80] Alan W. Black,et al. The CMU Arctic speech databases , 2004, SSW.
[81] Subjective evaluation of speech quality with a crowdsourcing approach Summary , 2022 .