AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
暂无分享,去创建一个
Jia-Bin Huang | Shinji Watanabe | Zhenhui Ye | Jiatong Shi | Dongchao Yang | Rongjie Huang | Zhou Zhao | Jinglin Liu | Xuankai Chang | Yuning Wu | Mingze Li | Zhiqing Hong | Yixiang Ren
[1] Xu Tan,et al. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace , 2023, ArXiv.
[2] Chenfei Wu,et al. Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models , 2023, ArXiv.
[3] Li Dong,et al. Language Is Not All You Need: Aligning Perception with Language Models , 2023, NeurIPS.
[4] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.
[5] Zhenhui Ye,et al. GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis , 2023, ICLR.
[6] Jia-Bin Huang,et al. Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models , 2023, ICML.
[7] Timo I. Denk,et al. MusicLM: Generating Music From Text , 2023, ArXiv.
[8] Chao Weng,et al. Diffsound: Discrete Diffusion Model for Text-to-Sound Generation , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[9] Benoît Sagot,et al. Generative Spoken Dialogue Language Modeling , 2022, TACL.
[10] Jong Wook Kim,et al. Robust Speech Recognition via Large-Scale Weak Supervision , 2022, ICML.
[11] Gabriel Synnaeve,et al. High Fidelity Neural Audio Compression , 2022, ArXiv.
[12] Dongchao Yang,et al. Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification , 2022, INTERSPEECH.
[13] Shinji Watanabe,et al. TF-GRIDNET: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation , 2022, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] David Grangier,et al. AudioLM: A Language Modeling Approach to Audio Generation , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[15] Yi Ren,et al. TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation , 2022, ICLR.
[16] Yi Ren,et al. GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis , 2022, NeurIPS.
[17] Xi Victoria Lin,et al. OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.
[18] Max W. Y. Lam,et al. FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis , 2022, IJCAI.
[19] Qiuqiang Kong,et al. Separate What You Describe: Language-Queried Audio Source Separation , 2022, INTERSPEECH.
[20] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[21] Abdel-rahman Mohamed,et al. textless-lib: a Library for Textless Spoken Language Processing , 2022, NAACL.
[22] Renelito Delos Santos,et al. LaMDA: Language Models for Dialog Applications , 2022, ArXiv.
[23] Lei Xie,et al. VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[25] Zhou Zhao,et al. DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism , 2021, AAAI.
[26] Marco Tagliasacchi,et al. SoundStream: An End-to-End Neural Audio Codec , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[27] Haozhe Wu,et al. Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis , 2021, ACM Multimedia.
[28] Zhou Zhao,et al. Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus , 2021, ACM Multimedia.
[29] Helin Wang,et al. Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information , 2021, DCASE.
[30] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[31] Florian Metze,et al. Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks , 2021, NAACL.
[32] Tie-Yan Liu,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2020, ICLR.
[33] Fang Liu,et al. Multi-task Learning based Pre-trained Language Model for Code Completion , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).
[34] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[35] Shinji Watanabe,et al. DiscreTalk: Text-to-Speech as a Machine Translation Problem , 2020, ArXiv.
[36] R. Socher,et al. A Simple Language Model for Task-Oriented Dialogue , 2020, Neural Information Processing Systems.
[37] Alexandra Birch,et al. Language Model Prior for Low-Resource Neural Machine Translation , 2020, EMNLP.
[38] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[39] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[40] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[41] Neel Sundaresan,et al. Pythia: AI-assisted Code Completion System , 2019, KDD.
[42] Nima Mesgarani,et al. Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[43] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[44] Lars Schmidt-Thieme,et al. NeuralWarp: Time-Series Similarity with Warping Networks , 2018, ArXiv.
[45] Yoshua Bengio,et al. On integrating a language model into neural machine translation , 2017, Comput. Speech Lang..