暂无分享,去创建一个
Chuang Gan | Mark Hasegawa-Johnson | Shiyu Chang | Yang Zhang | Jinjun Xiong | Kaizhi Qian | David Cox | M. Hasegawa-Johnson | Chuang Gan | Shiyu Chang | Yang Zhang | Jinjun Xiong | Kaizhi Qian | David Cox
[1] Yu Tsao,et al. Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion , 2020, IEEE Transactions on Emerging Topics in Computational Intelligence.
[2] Fadi Biadsy,et al. Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation , 2019, INTERSPEECH.
[3] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Yu Tsao,et al. Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks , 2017, INTERSPEECH.
[5] Vincent Wan,et al. CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network , 2019, ICML.
[6] Hung-yi Lee,et al. One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization , 2019, INTERSPEECH.
[7] John R. Hershey,et al. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.
[8] Tetsuya Takiguchi,et al. GMM-Based Emotional Voice Conversion Using Spectrum and Prosody Features , 2012 .
[9] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[10] Joan Serra,et al. Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion , 2019, NeurIPS.
[11] Mark Hasegawa-Johnson,et al. F0-Consistent Many-To-Many Non-Parallel Voice Conversion Via Conditional Autoencoder , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Yu Tsao,et al. Voice conversion from non-parallel corpora using variational auto-encoder , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[13] Thierry Dutoit,et al. The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems , 2018, ArXiv.
[14] Berrak Sisman,et al. Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data , 2020, Odyssey.
[15] Kou Tanaka,et al. StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion , 2019, INTERSPEECH.
[16] Yoshihiko Nankaku,et al. Statistical Voice Conversion Based on Wavenet , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Mark Hasegawa-Johnson,et al. Zero-Shot Voice Style Transfer with Only Autoencoder Loss , 2019, ICML.
[18] Hirokazu Kameoka,et al. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks , 2017, ArXiv.
[19] Kaiming He,et al. Group Normalization , 2018, ECCV.
[20] Haizhou Li,et al. Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion , 2016, INTERSPEECH.
[21] Lior Wolf,et al. Unsupervised Singing Voice Conversion , 2019, INTERSPEECH.
[22] Haizhou Li,et al. Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[23] Jes'us Villalba,et al. Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery , 2020, INTERSPEECH.
[24] Tetsuya Takiguchi,et al. Emotional voice conversion using deep neural networks with MCC and F0 features , 2016, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS).
[25] Yang Gao,et al. Voice Impersonation Using Generative Adversarial Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Liwei Wang,et al. On Layer Normalization in the Transformer Architecture , 2020, ICML.
[27] Aijun Li,et al. Prosody conversion from neutral speech to emotional speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[29] Kou Tanaka,et al. StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[30] Kou Tanaka,et al. ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[31] Deva Ramanan,et al. Unsupervised Audiovisual Synthesis via Exemplar Autoencoders , 2021, ICLR.
[32] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .
[33] Mark Hasegawa-Johnson,et al. Unsupervised Speech Decomposition via Triple Information Bottleneck , 2020, ICML.
[34] Jung-Woo Ha,et al. StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[35] Ryan Prenger,et al. Mellotron: Multispeaker Expressive Voice Synthesis by Conditioning on Rhythm, Pitch and Global Style Tokens , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Steve J. Young,et al. Data-driven emotion conversion in spoken English , 2009, Speech Commun..
[37] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[38] Lin-Shan Lee,et al. Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations , 2018, INTERSPEECH.
[39] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[40] Chung-Hsien Wu,et al. Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[41] Lior Wolf,et al. Attention-based Wavenet Autoencoder for Universal Voice Conversion , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Hung-yi Lee,et al. Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).