Fast and High-Quality Singing Voice Synthesis System Based on Convolutional Neural Networks
暂无分享,去创建一个
Yoshihiko Nankaku | Keiichi Tokuda | Kazuhiro Nakamura | Kei Hashimoto | Shinji Takaki | Keiichiro Oura
[1] Frank K. Soong,et al. TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.
[2] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[3] Heiga Zen,et al. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[5] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[6] Yoshihiko Nankaku,et al. Trajectory training considering global variance for speech synthesis based on neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Frank K. Soong,et al. On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Anthony J. Robinson,et al. Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.
[9] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[10] Ming Zhou,et al. Close to Human Quality TTS with Transformer , 2018, ArXiv.
[11] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[12] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[13] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[14] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[15] Kyogu Lee,et al. Adversarially Trained End-to-end Korean Singing Voice Synthesis System , 2019, INTERSPEECH.
[16] Bo Xu,et al. Combining unidirectional long short-term memory with convolutional output layer for high-performance speech synthesis , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[18] Bhuvana Ramabhadran,et al. Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks , 2014, INTERSPEECH.
[19] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Li-Rong Dai,et al. Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling , 2019, INTERSPEECH.
[21] Yoshihiko Nankaku,et al. Mel-Cepstrum-Based Quantization Noise Shaping Applied to Neural-Network-Based Speech Waveform Synthesis , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[22] Zhizheng Wu,et al. From HMMS to DNNS: Where do the improvements come from? , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Yoshihiko Nankaku,et al. Recent Development of the DNN-based Singing Voice Synthesis System — Sinsy , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[24] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[25] Yoshihiko Nankaku,et al. Singing Voice Synthesis Based on Deep Neural Networks , 2016, INTERSPEECH.
[26] Adam Finkelstein,et al. Fftnet: A Real-Time Speaker-Dependent Neural Vocoder , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Jordi Bonada,et al. A Neural Parametric Singing Synthesizer , 2017, INTERSPEECH.
[28] Yoshihiko Nankaku,et al. The effect of neural networks in statistical parametric speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[30] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).