WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis
暂无分享,去创建一个
[1] Andreas Zell,et al. Seeing Implicit Neural Representations as Fourier Series , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
[2] Shuicheng Yan,et al. VOLO: Vision Outlooker for Visual Recognition , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Jaesam Yoon,et al. UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation , 2021, Interspeech.
[4] Jing Xiao,et al. LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Jing Xiao,et al. MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution , 2020, ArXiv.
[6] Guillaume Fuchs,et al. StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Ryuichi Yamamoto,et al. Parallel Waveform Synthesis Based on Generative Adversarial Networks with Voicing-Aware Conditional Discriminators , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Heiga Zen,et al. WaveGrad: Estimating Gradients for Waveform Generation , 2020, ICLR.
[9] Kurt Keutzer,et al. FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge , 2020, ArXiv.
[10] D. Lim,et al. Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains , 2020, ArXiv.
[11] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[12] Youngik Kim,et al. VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network , 2020, INTERSPEECH.
[13] Gordon Wetzstein,et al. Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.
[14] FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction , 2020, INTERSPEECH.
[15] Wei Ping,et al. WaveFlow: A Compact Flow-based Model for Raw Audio , 2019, ICML.
[16] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[18] Chengzhu Yu,et al. DurIAN: Duration Informed Attention Network For Multimodal Synthesis , 2019, ArXiv.
[19] Sungwon Kim,et al. FloWaveNet : A Generative Flow for Raw Audio , 2018, ICML.
[20] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Jan Skoglund,et al. LPCNET: Improving Neural Speech Synthesis through Linear Prediction , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[23] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[24] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[25] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[26] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[27] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[28] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[29] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[30] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[32] Hideki Kawahara,et al. STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds , 2006 .
[33] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).