Unsupervised Speech Decomposition via Triple Information Bottleneck
暂无分享,去创建一个
[1] Manfred R. Schroeder,et al. Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[2] Joan Serra,et al. Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion , 2019, NeurIPS.
[3] Bishnu S. Atal,et al. Predictive coding of speech signals and subjective error criteria , 1978, ICASSP.
[4] Abeer Alwan,et al. Reducing F0 Frame Error of F0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[5] S. Kotz,et al. Emotional speech processing: Disentangling the effects of prosody and semantic cues , 2011, Cognition & emotion.
[6] Slava Shechtman,et al. Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).
[7] Yu-Huai Peng,et al. Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion , 2020, IEEE Transactions on Emerging Topics in Computational Intelligence.
[8] Yu Tsao,et al. Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks , 2017, INTERSPEECH.
[9] Jung-Woo Ha,et al. StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[10] Kou Tanaka,et al. StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion , 2019, INTERSPEECH.
[11] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[12] Pat Langley,et al. Crafting Papers on Machine Learning , 2000, ICML.
[13] Tomohiro Nakatani,et al. A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments , 2008, Speech Commun..
[14] Emily Q. Wang,et al. Pitch targets and their realization: Evidence from Mandarin Chinese , 2001, Speech Commun..
[15] Yang Gao,et al. Voice Impersonation Using Generative Adversarial Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Mark Hasegawa-Johnson,et al. Zero-Shot Voice Style Transfer with Only Autoencoder Loss , 2019, ICML.
[17] Yu Tsao,et al. Voice conversion from non-parallel corpora using variational auto-encoder , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[18] Vincent Wan,et al. CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network , 2019, ICML.
[19] Hirokazu Kameoka,et al. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks , 2017, ArXiv.
[20] Fadi Biadsy,et al. Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation , 2019, INTERSPEECH.
[21] Guillaume Lample,et al. Fader Networks: Manipulating Images by Sliding Attributes , 2017, NIPS.
[22] Mark Hasegawa-Johnson,et al. Improvement of Probabilistic Acoustic Tube model for speech decomposition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Kaiming He,et al. Group Normalization , 2018, ECCV.
[24] Thomas Quatieri,et al. Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .
[25] P. Mermelstein,et al. Distance measures for speech recognition, psychological and instrumental , 1976 .
[26] Ryan Prenger,et al. Mellotron: Multispeaker Expressive Voice Synthesis by Conditioning on Rhythm, Pitch and Global Style Tokens , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[28] Lin-Shan Lee,et al. Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations , 2018, INTERSPEECH.
[29] Hung-yi Lee,et al. One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization , 2019, INTERSPEECH.
[30] Mark Hasegawa-Johnson,et al. F0-Consistent Many-To-Many Non-Parallel Voice Conversion Via Conditional Autoencoder , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Hideki Kawahara,et al. Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[32] Lior Wolf,et al. Unsupervised Singing Voice Conversion , 2019, INTERSPEECH.
[33] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .
[34] Lior Wolf,et al. Attention-based Wavenet Autoencoder for Universal Voice Conversion , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Kou Tanaka,et al. StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[36] Yoshihiko Nankaku,et al. Statistical Voice Conversion Based on Wavenet , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Yu Tsao,et al. Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders , 2018, 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[38] Kou Tanaka,et al. ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder , 2018, ArXiv.