Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion
暂无分享,去创建一个
[1] Taehwan Kim,et al. Investigation of Using Disentangled and Interpretable Representations for One-shot Cross-lingual Voice Conversion , 2018, INTERSPEECH.
[2] Patrick Kenny,et al. Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.
[3] M. Yuan,et al. Model selection and estimation in regression with grouped variables , 2006 .
[4] Tomoki Toda,et al. Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion , 2018, 2019 27th European Signal Processing Conference (EUSIPCO).
[5] Zhizheng Wu,et al. Voice conversion and spoofing attack on speaker verification systems , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.
[6] Masanori Morise,et al. D4C, a band-aperiodicity estimator for high-quality speech synthesis , 2016, Speech Commun..
[7] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[8] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[9] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[10] Yu Zhang,et al. Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data , 2017, NIPS.
[11] Murray Shanahan,et al. Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.
[12] Shinnosuke Takamichi,et al. Non-Parallel Voice Conversion Using Variational Autoencoders Conditioned by Phonetic Posteriorgrams and D-Vectors , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Tetsuya Takiguchi,et al. Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[14] Kou Tanaka,et al. ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder , 2018, ArXiv.
[15] Haizhou Li,et al. Exemplar-based voice conversion using joint nonnegative matrix factorization , 2015, Multimedia Tools and Applications.
[16] Kun Li,et al. Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Ricardo Gutierrez-Osuna,et al. Developing Objective Measures of Foreign-Accent Conversion , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[18] Alan W. Black,et al. The CMU Arctic speech databases , 2004, SSW.
[19] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[20] Kishore Prahallad,et al. Spectral Mapping Using Artificial Neural Networks for Voice Conversion , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[21] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[22] Ricardo Gutierrez-Osuna,et al. Accent Conversion Using Phonetic Posteriorgrams , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Ricardo Gutierrez-Osuna,et al. SABR: sparse, anchor-based representation of the speech signal , 2015, INTERSPEECH.
[24] Ricardo Gutierrez-Osuna,et al. Foreign accent conversion in computer assisted pronunciation training , 2009, Speech Commun..
[25] Seyed Hamidreza Mohammadi,et al. A Voice Conversion Mapping Function Based on a Stacked Joint-Autoencoder , 2016, INTERSPEECH.
[26] Barnabás Póczos,et al. Online group-structured dictionary learning , 2011, CVPR 2011.
[27] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.
[28] Tsao Yu,et al. Voice conversion from non-parallel corpora using variational auto-encoder , 2016 .
[29] Alexander Kain,et al. Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[30] Huachun Tan,et al. Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.
[31] Ole Winther,et al. How to Train Deep Variational Autoencoders and Probabilistic Ladder Networks , 2016, ICML 2016.
[32] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[33] Erik McDermott,et al. Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Yu Tsao,et al. Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks , 2017, INTERSPEECH.
[35] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[36] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[37] Sabine Buchholz,et al. Crowdsourcing Preference Tests, and How to Detect Cheating , 2011, INTERSPEECH.
[38] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..
[39] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Daniel Erro,et al. INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[41] Junichi Yamagishi,et al. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2017 .
[42] Max Welling,et al. Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.
[43] Haifeng Li,et al. A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences , 2016, INTERSPEECH.
[44] Tetsuya Takiguchi,et al. Exemplar-based voice conversion in noisy environment , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).
[45] Li-Rong Dai,et al. Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[46] Donald J. Berndt,et al. Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.