暂无分享,去创建一个
Zhe Gan | Lawrence Carin | Ruiyi Zhang | Weituo Hao | Pengyu Cheng | Siyang Yuan | L. Carin | Zhe Gan | Ruiyi Zhang | Pengyu Cheng | Weituo Hao | Siyang Yuan
[1] Serge J. Belongie,et al. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[2] Stefano Ermon,et al. Learning Controllable Fair Representations , 2018, AISTATS.
[3] Mark Hasegawa-Johnson,et al. Zero-Shot Voice Style Transfer with Only Autoencoder Loss , 2019, ICML.
[4] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.
[5] Haizhou Li,et al. Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[6] Vladimir Pavlovic,et al. Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approach , 2018, IEEE Transactions on Image Processing.
[7] Sylvain Paris,et al. Deep Photo Style Transfer , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Kou Tanaka,et al. StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[9] Hirokazu Kameoka,et al. CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).
[10] Yoshua Bengio,et al. Learning deep representations by mutual information estimation and maximization , 2018, ICLR.
[11] Lawrence Carin,et al. Improving Disentangled Text Representation Learning with Information-Theoretic Guidance , 2020, ACL.
[12] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[13] Leon A. Gatys,et al. Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Roger B. Grosse,et al. Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.
[15] Chung-Hsien Wu,et al. Map-based adaptation for speech conversion using adaptation data selection and non-parallel training , 2006, INTERSPEECH.
[16] Alexander A. Alemi,et al. Deep Variational Information Bottleneck , 2017, ICLR.
[17] Hung-yi Lee,et al. One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization , 2019, INTERSPEECH.
[18] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Martin J. Wainwright,et al. Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.
[20] Junichi Yamagishi,et al. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2017 .
[21] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.
[22] Yu Tsao,et al. Voice conversion from non-parallel corpora using variational auto-encoder , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[23] Santosh Chapaneri,et al. Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping , 2012 .
[24] Alfred O. Hero,et al. Ensemble estimation of multivariate f-divergence , 2014, 2014 IEEE International Symposium on Information Theory.
[25] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[26] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[27] Geeta Nijhawan,et al. ISOLATED SPEECH RECOGNITIONUSING MFCC AND DTW , 2013 .
[28] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.
[29] Guillaume Desjardins,et al. Understanding disentangling in $\beta$-VAE , 2018, 1804.03599.
[30] Pietro Liò,et al. Deep Graph Infomax , 2018, ICLR.
[31] Joan Serra,et al. Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion , 2019, NeurIPS.
[32] Eric P. Xing,et al. Unsupervised Text Style Transfer using Language Models as Discriminators , 2018, NeurIPS.
[33] Olivier Rosec,et al. Voice Conversion Using Dynamic Frequency Warping With Amplitude Scaling, for Parallel or Nonparallel Corpora , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[34] Hao Wang,et al. Real-Time Neural Style Transfer for Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Tomoki Toda,et al. Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech , 2006, INTERSPEECH.
[36] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.
[37] Guillaume Lample,et al. Multiple-Attribute Text Rewriting , 2018, ICLR.
[38] Seyed Hamidreza Mohammadi,et al. An overview of voice conversion systems , 2017, Speech Commun..
[39] Christopher Burgess,et al. DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.
[40] Zhe Gan,et al. CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information , 2020, ICML.
[41] Regina Barzilay,et al. Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.
[42] I. Elamvazuthi,et al. Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.
[43] Jordi Bonada,et al. Applying voice conversion to concatenative singing-voice synthesis , 2010, INTERSPEECH.
[44] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[45] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.
[46] Andriy Mnih,et al. Disentangling by Factorising , 2018, ICML.
[47] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[48] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[49] Nenghai Yu,et al. Coherent Online Video Style Transfer , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[50] Shinnosuke Takamichi,et al. Non-Parallel Voice Conversion Using Variational Autoencoders Conditioned by Phonetic Posteriorgrams and D-Vectors , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[51] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[52] Zhizheng Wu,et al. Analysis of the Voice Conversion Challenge 2016 Evaluation Results , 2016, INTERSPEECH.
[53] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Bernhard Schölkopf,et al. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.
[55] Donald J. Berndt,et al. Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.
[56] Jung-Woo Ha,et al. StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.