Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling
暂无分享,去创建一个
Tomoki Toda | Yi-Chiao Wu | Tomoki Hayashi | Kazuhiro Kobayashi | Patrick Lumban Tobing | T. Toda | Kazuhiro Kobayashi | Tomoki Hayashi | Yi-Chiao Wu
[1] Florian Metze,et al. Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the “Speaking Rosetta” JSALT 2017 Workshop , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] S. Sakti,et al. Development of HMM-based Indonesian Speech Synthesis , 2008 .
[3] Tomoki Toda,et al. Non-Parallel Voice Conversion with Cyclic Variational Autoencoder , 2019, INTERSPEECH.
[4] Aren Jansen,et al. The zero resource speech challenge 2017 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[5] James R. Glass. Towards unsupervised speech processing , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).
[6] Tomoki Toda,et al. Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Tomoki Toda,et al. An investigation of multi-speaker training for wavenet vocoder , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[8] Haizhou Li,et al. VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019 , 2019, INTERSPEECH.
[9] Ewald van der Westhuizen,et al. Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural networks , 2019, INTERSPEECH.
[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[11] Kenneth Ward Church,et al. A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[12] Satoshi Nakamura,et al. Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.
[13] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[14] Aren Jansen,et al. The Zero Resource Speech Challenge 2015: Proposed Approaches and Results , 2016, SLTU.
[15] Sakriani Sakti,et al. The Zero Resource Speech Challenge 2019: TTS without T , 2019, INTERSPEECH.
[16] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[17] Keiichi Tokuda,et al. Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.
[18] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[19] Lukás Burget,et al. Variational Inference for Acoustic Unit Discovery , 2016, Workshop on Spoken Language Technologies for Under-resourced Languages.
[20] Satoshi Nakamura,et al. Feature optimized DPGMM clustering for unsupervised subword modeling: A contribution to zerospeech 2017 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[21] Tomoki Toda,et al. Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion , 2018, 2019 27th European Signal Processing Conference (EUSIPCO).
[22] Ron J. Weiss,et al. Unsupervised Speech Representation Learning Using WaveNet Autoencoders , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[23] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[24] Alan W. Black,et al. Automatic discovery of a phonetic inventory for unwritten languages for statistical speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Yu Tsao,et al. Voice conversion from non-parallel corpora using variational auto-encoder , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[26] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[27] Yu Zhang,et al. Learning Latent Representations for Speech Generation and Transformation , 2017, INTERSPEECH.
[28] Aren Jansen,et al. Evaluating speech features with the minimal-pair ABX task: analysis of the classical MFC/PLP pipeline , 2013, INTERSPEECH.
[29] Satoshi Nakamura,et al. Development of Indonesian Large Vocabulary Continuous Speech Recognition System within A-STAR Project , 2008, IJCNLP.
[30] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[31] Hynek Hermansky,et al. Evaluating speech features with the minimal-pair ABX task (II): resistance to noise , 2014, INTERSPEECH.
[32] Kou Tanaka,et al. ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[33] Patrick Lumban Tobing,et al. Voice Conversion With CycleRNN-Based Spectral Mapping and Finely Tuned WaveNet Vocoder , 2019, IEEE Access.