Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification
暂无分享,去创建一个
Shinnosuke Takamichi | Yusuke Ijima | Kyosuke Nishida | Yuki Saito | Taiki Nakamura | Kyosuke Nishida | Shinnosuke Takamichi | Yuki Saito | Taiki Nakamura | Yusuke Ijima
[1] Georg Heigold,et al. End-to-end text-dependent speaker verification , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[3] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.
[4] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.
[5] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[6] Tomoki Toda,et al. Singing voice conversion method based on many-to-many eigenvoice conversion and training data generation using a singing-to-singing synthesis system , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.
[7] Hao Wang,et al. Phonetic posteriorgrams for many-to-one voice conversion without parallel data training , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).
[8] Junichi Yamagishi,et al. Adapting and controlling DNN-based speech synthesis using input codes , 2017, ICASSP.
[9] Ricardo Gutierrez-Osuna,et al. Can voice conversion be used to reduce non-native accents? , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[11] M. Sambur,et al. Selection of acoustic features for speaker identification , 1975 .
[12] Shinnosuke Takamichi,et al. Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[13] John-Paul Hosom,et al. Improving the intelligibility of dysarthric speech , 2007, Speech Commun..
[14] Hirokazu Kameoka,et al. CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).
[15] Erik McDermott,et al. Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Junichi Yamagishi,et al. High-Quality Nonparallel Voice Conversion Based on Cycle-Consistent Adversarial Network , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..
[18] Kishore Prahallad,et al. Voice conversion using Artificial Neural Networks , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[19] Sebastian Stüker,et al. Multilingual Adaptation of RNN Based ASR Systems , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Alexei A. Efros,et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[21] Satoshi Nakamura,et al. Listening while speaking: Speech chain by deep learning , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[22] Shinnosuke Takamichi,et al. Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities , 2017, INTERSPEECH.
[23] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[24] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[25] Ying Zhang,et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks , 2016, INTERSPEECH.
[26] Matthias Wölfel. Channel selection by class separability measures for automatic transcriptions on distant microphones , 2007, INTERSPEECH.
[27] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[28] Yusuke Ijima,et al. DNN-Based Speech Synthesis Using Speaker Codes , 2018, IEICE Trans. Inf. Syst..
[29] Yu Tsao,et al. Voice conversion from non-parallel corpora using variational auto-encoder , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[30] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[31] Murray Shanahan,et al. Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.
[32] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[33] Shinnosuke Takamichi,et al. Non-Parallel Voice Conversion Using Variational Autoencoders Conditioned by Phonetic Posteriorgrams and D-Vectors , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Tomoki Toda,et al. Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential , 2018, Speech Commun..
[35] 拓海 杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .