LMCodec: A Low Bitrate Speech Codec with Causal Transformer Models
暂无分享,去创建一个
W. Kleijn | J. Skoglund | Neil Zeghidour | Michael Chinen | M. Tagliasacchi | Zalán Borsos | Teerapat Jenrungrot
[1] David Grangier,et al. AudioLM: A Language Modeling Approach to Audio Generation , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[2] W. Kleijn,et al. Ultra-Low-Bitrate Speech Coding with Pretrained Transformers , 2022, INTERSPEECH.
[3] J. Yamagishi,et al. Generalization Ability of MOS Prediction Networks , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Marco Tagliasacchi,et al. SoundStream: An End-to-End Neural Audio Codec , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[5] Minje Kim,et al. Harp-Net: Hyper-Autoencoded Reconstruction Propagation for Scalable Neural Audio Coding , 2021, 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
[6] Eugene Kharitonov,et al. Speech Resynthesis from Discrete Disentangled Self-Supervised Representations , 2021, Interspeech.
[7] Andrew Hines,et al. Warp-Q: Quality Prediction for Generative Neural Speech Codecs , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[9] Andrew Hines,et al. ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric , 2020, 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX).
[10] Abdel-rahman Mohamed,et al. Libri-Light: A Benchmark for ASR with Limited or No Supervision , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Junichi Yamagishi,et al. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92) , 2019 .
[12] Minje Kim,et al. Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding , 2019, INTERSPEECH.
[13] Thomas C. Walters,et al. Low Bit-rate Speech Coding with VQ-VAE and a WaveNet Decoder , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Jan Skoglund,et al. LPCNET: Improving Neural Speech Synthesis through Linear Prediction , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[16] Srihari Kankanahalli,et al. End-To-End Optimized Speech Coding with Deep Neural Networks , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Jean-Marc Valin,et al. A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement , 2017, 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP).
[18] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[19] Zhe Wang,et al. Overview of the EVS codec architecture , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Jodi Kearns,et al. LibriVox: Free Public Domain Audiobooks , 2014 .
[22] Timothy B. Terriberry,et al. Definition of the Opus Audio Codec , 2012, RFC.
[23] Shigeo Morishima,et al. Speech coding based on a multi-layer neural network , 1990, IEEE International Conference on Communications, Including Supercomm Technical Sessions.