Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
暂无分享,去创建一个
Zhi-Qi Cheng | Jingdong Sun | Zebang Cheng | Yuxiang Lin | Jun-Yan He | Kai Wang | Zheng Lian | Xiaojiang Peng | Alexander G. Hauptmann
[1] Zhi-Qi Cheng,et al. MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis , 2024, ArXiv.
[2] Hong-Han Shuai,et al. EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning , 2024, ArXiv.
[3] Zhi-Qi Cheng,et al. MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models , 2024, SEMEVAL.
[4] Chandni Saxena,et al. JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models , 2024, SEMEVAL.
[5] Licai Sun,et al. GPT-4V with emotion: A zero-shot benchmark for Generalized Emotion Recognition , 2023, Inf. Fusion.
[6] Peng Jin,et al. Video-LLaVA: Learning United Visual Representation by Alignment Before Projection , 2023, ArXiv.
[7] Xiaohuan Zhou,et al. Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models , 2023, ArXiv.
[8] Dinghao Zhou,et al. Learning Aligned Audiovisual Representations for Multimodal Sentiment Analysis , 2023, MRAC@MM.
[9] Haifeng Chen,et al. Semi-Supervised Multimodal Emotion Recognition with Class-Balanced Pseudo-labeling , 2023, ACM Multimedia.
[10] Shuyi Mao,et al. Semi-Supervised Multimodal Emotion Recognition with Expression MAE , 2023, ACM Multimedia.
[11] Raghuraman Krishnamoorthi,et al. MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning , 2023, ArXiv.
[12] Guanting Dong,et al. InstructERC: Reforming Emotion Recognition in Conversation with a Retrieval Multi-task LLMs Framework , 2023, ArXiv.
[13] Wenhu Chen,et al. MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning , 2023, ArXiv.
[14] Eric Michael Smith,et al. Llama 2: Open Foundation and Fine-Tuned Chat Models , 2023, ArXiv.
[15] D. Cohen-Or,et al. EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[16] B. Liu,et al. MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition , 2023, ACM Multimedia.
[17] Feng Zhu,et al. Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic , 2023, ArXiv.
[18] Li Dong,et al. Kosmos-2: Grounding Multimodal Large Language Models to the World , 2023, ArXiv.
[19] K. Lim,et al. PAtt-Lite: Lightweight Patch and Attention MobileNet for Challenging Facial Expression Recognition , 2023, IEEE Access.
[20] Zhongyu Wei,et al. Valley: Video Assistant with Large Language model Enhanced abilitY , 2023, ArXiv.
[21] Salman Khan,et al. Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models , 2023, ArXiv.
[22] Lidong Bing,et al. Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding , 2023, EMNLP.
[23] Yan Wang,et al. PandaGPT: One Model To Instruction-Follow Them All , 2023, TLLM.
[24] J. Z. Wang,et al. Learning Emotion Representations from Verbal and Nonverbal Communication , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Jiannan Wu,et al. VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks , 2023, NeurIPS.
[26] Yi Wang,et al. VideoChat: Chat-Centric Video Understanding , 2023, ArXiv.
[27] Kalyan Vasudev Alwala,et al. ImageBind One Embedding Space to Bind Them All , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Yu-Gang Jiang,et al. Implicit Temporal Modeling with Learnable Alignment for Video Recognition , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[29] Björn Schuller,et al. MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning , 2023, ACM Multimedia.
[30] Yong Jae Lee,et al. Visual Instruction Tuning , 2023, NeurIPS.
[31] Hongsheng Li,et al. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention , 2023, ArXiv.
[32] Yuan-Zheng Wang,et al. Decoupled Multimodal Distilling for Emotion Recognition , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Mehdi S. M. Sajjadi,et al. PaLM-E: An Embodied Multimodal Language Model , 2023, ICML.
[34] Li Dong,et al. Language Is Not All You Need: Aligning Perception with Language Models , 2023, NeurIPS.
[35] S. Savarese,et al. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models , 2023, ICML.
[36] Suraya Alias,et al. Beyond Sentiment Analysis: A Review of Recent Trends in Text Based Sentiment Analysis and Emotion Detection , 2023, J. Adv. Comput. Intell. Intell. Informatics.
[37] V. Kondratenko,et al. Large Raw Emotional Dataset with Aggregation Mechanism , 2022, 2212.12266.
[38] Xi Victoria Lin,et al. OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization , 2022, ArXiv.
[39] Noah A. Smith,et al. Self-Instruct: Aligning Language Models with Self-Generated Instructions , 2022, ACL.
[40] Ledell Yu Wu,et al. EVA: Exploring the Limits of Masked Visual Representation Learning at Scale , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Andrew M. Dai,et al. Scaling Instruction-Finetuned Language Models , 2022, ArXiv.
[42] Feng Zhao,et al. Intensity-Aware Loss for Dynamic Facial Expression Recognition in the Wild , 2022, AAAI.
[43] A. Hauptmann,et al. GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement , 2022, ACM Multimedia.
[44] Xi Victoria Lin,et al. OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.
[45] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.
[46] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[47] Limin Wang,et al. VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training , 2022, NeurIPS.
[48] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[49] Ross B. Girshick,et al. Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Qingshan Liu,et al. Former-DFER: Dynamic Facial Expression Recognition Transformer , 2021, ACM Multimedia.
[51] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[52] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[53] Xiangmin Xu,et al. LSSED: A Large-Scale Dataset and Benchmark for Speech Emotion Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Wenming Zheng,et al. DFEW: A Large-Scale Database for Recognizing Dynamic Facial Expressions in the Wild , 2020, ACM Multimedia.
[55] Tom B. Brown,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[56] Jianfei Yang,et al. Suppressing Uncertainties for Large-Scale Facial Expression Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Maryam Imani,et al. A survey of emotion recognition methods with emphasis on E-Learning environments , 2019, J. Netw. Comput. Appl..
[58] Peter J. Liu,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[59] Jun Du,et al. Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition , 2019, ICMI.
[60] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[61] Erik Cambria,et al. Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph , 2018, ACL.
[62] Yang Liu,et al. Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Yang Liu,et al. Video eCommerce++: Toward Large Scale Online Video Advertising , 2017, IEEE Transactions on Multimedia.
[64] Lawrence H. Gerstein,et al. Emotion Recognition, Emotion Expression, and Cultural Display Rules: Implications for Counseling , 2017 .
[65] Yang Liu,et al. Video eCommerce: Towards Online Video Advertising , 2016, ACM Multimedia.
[66] Alexandra Birch,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[67] H. Ip,et al. Human Computer Interaction , 2015, Lecture Notes in Computer Science.
[68] I. Mackenzie. Human-Computer Interaction: An Empirical Research Perspective , 2012 .
[69] Jing Yu Koh,et al. Grounding Language Models to Images for Multimodal Generation , 2023, ArXiv.
[70] Yuanzhi Wang,et al. Incomplete Multimodality-Diffused Emotion Recognition , 2023, NeurIPS.
[71] Junyang Lin,et al. Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities , 2023, ArXiv.
[72] Noah A. Smith,et al. Benchmarking Generalization via In-Context Instructions on 1, 600+ Language Tasks , 2022, ArXiv.
[73] Z. Wan,et al. Psychological Counseling and Character Analysis Algorithm Based on Image Emotion , 2020, IEEE Access.
[74] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.