Towards Achieving Robust Universal Neural Vocoding
暂无分享,去创建一个
Thomas Drugman | Thomas Merritt | Alexis Moinet | Jaime Lorenzo-Trueba | Roberto Barra-Chicote | Vatsal Aggarwal | Javier Latorre | Bartosz Putrycz | A. Moinet | R. Barra-Chicote | Jaime Lorenzo-Trueba | Thomas Drugman | Javier Latorre | Thomas Merritt | Bartosz Putrycz | Vatsal Aggarwal
[1] Haizhou Li,et al. A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder , 2018, INTERSPEECH.
[2] Gregory Diamos,et al. Fast Spectrogram Inversion Using Multi-Head Convolutional Neural Networks , 2018, IEEE Signal Processing Letters.
[3] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[4] Adam Finkelstein,et al. Fftnet: A Real-Time Speaker-Dependent Neural Vocoder , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Tomoki Toda,et al. Collapsed speech segment detection and suppression for WaveNet vocoder , 2018, INTERSPEECH.
[6] Paavo Alku,et al. Comparison of multiple voice source parameters in different phonation types , 2007, INTERSPEECH.
[7] Simon King,et al. The Blizzard Challenge 2008 , 2008 .
[8] Dong Yu,et al. Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis , 2018, INTERSPEECH.
[9] Cassia Valentini-Botinhao. Noisy reverberant speech database for training speech enhancement algorithms and TTS models , 2017 .
[10] Thierry Dutoit,et al. The Deterministic Plus Stochastic Model of the Residual Signal and Its Applications , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[11] Antoine Liutkus,et al. The 2016 Signal Separation Evaluation Campaign , 2017, LVA/ICA.
[12] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[13] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[14] Cassia Valentini-Botinhao. Reverberant speech database for training speech dereverberation algorithms and TTS models , 2016 .
[15] Mark A. Clements,et al. Speech concatenation and synthesis using an overlap-add sinusoidal model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[16] Hideki Kawahara,et al. Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT , 2001, MAVEBA.
[17] Tomoki Toda,et al. An investigation of multi-speaker training for wavenet vocoder , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[18] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[19] J. Liljencrants,et al. Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .
[20] Li-Rong Dai,et al. WaveNet Vocoder with Limited Training Data for Voice Conversion , 2018, INTERSPEECH.
[21] Simon King,et al. Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesis , 2014, INTERSPEECH.
[22] Simon King,et al. Attributing modelling errors in HMM synthesis by stepping gradually from natural to modelled speech , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Heiga Zen,et al. The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.
[24] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Lauri Juvela,et al. A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Cassia Valentini-Botinhao,et al. Noisy speech database for training speech enhancement algorithms and TTS models , 2017 .
[27] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[28] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[29] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Adam Nadolski,et al. Comprehensive Evaluation of Statistical Speech Waveform Synthesis , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[31] S. Scott,et al. When voices get emotional: A corpus of nonverbal vocalizations for research on emotion processing , 2013, Behavior research methods.
[32] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.
[33] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[34] Laurent Besacier,et al. Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof , 2016, LREC.