Implementation of DNN-based real-time voice conversion and its improvements by audio data augmentation and mask-shaped device
暂无分享,去创建一个
Shinnosuke Takamichi | Hiroshi Saruwatari | Riku Arakawa | H. Saruwatari | Shinnosuke Takamichi | Riku Arakawa
[1] Juan Pablo Bello,et al. A Software Framework for Musical Data Augmentation , 2015, ISMIR.
[2] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[3] Kun Li,et al. Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Luc Van Gool,et al. Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection , 2016, ArXiv.
[5] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .
[6] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.
[7] T. Toda,et al. The NAIST Text-to-Speech System for the Blizzard Challenge 2015 , 2015, The Blizzard Challenge 2015.
[8] Tomoki Toda,et al. Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[9] Shinnosuke Takamichi,et al. Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[10] Shinnosuke Takamichi,et al. Voice Conversion Using Input-to-Output Highway Networks , 2017, IEICE Trans. Inf. Syst..
[11] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[12] Keiichi Tokuda,et al. An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[13] Tomoki Toda,et al. Implementation of Computationally Efficient Real-Time Voice Conversion , 2012, INTERSPEECH.
[14] Justin Salamon,et al. Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.
[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[16] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[17] Tomoki Toda,et al. The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016 , 2016, INTERSPEECH.
[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[19] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[20] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..
[21] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[22] Werner Verhelst,et al. An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[23] Kou Tanaka,et al. A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation , 2014, IEICE Trans. Inf. Syst..
[24] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[25] Tetsuya Takiguchi,et al. Voice conversion in high-order eigen space using deep belief nets , 2013, INTERSPEECH.