Sentiment Analysis in the Era of Large Language Models: A Reality Check

Sentiment analysis (SA) has been a long-standing research area in natural language processing. It can offer rich insights into human sentiments and opinions and has thus seen considerable interest from both academia and industry. With the advent of large language models (LLMs) such as ChatGPT, there is a great potential for their employment on SA problems. However, the extent to which existing LLMs can be leveraged for different sentiment analysis tasks remains unclear. This paper aims to provide a comprehensive investigation into the capabilities of LLMs in performing various sentiment analysis tasks, from conventional sentiment classification to aspect-based sentiment analysis and multifaceted analysis of subjective texts. We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets. Our study reveals that while LLMs demonstrate satisfactory performance in simpler tasks, they lag behind in more complex tasks requiring deeper understanding or structured sentiment information. However, LLMs significantly outperform SLMs in few-shot learning settings, suggesting their potential when annotation resources are limited. We also highlight the limitations of current evaluation practices in assessing LLMs' SA abilities and propose a novel benchmark, \textsc{SentiEval}, for a more comprehensive and realistic evaluation. Data and code during our investigations are available at \url{https://github.com/DAMO-NLP-SG/LLM-Sentiment}.

[1]  Michael Bendersky,et al.  LLMs to the Moon? Reddit Market Sentiment Analysis with Large Language Models , 2023, WWW.

[2]  Haoming Jiang,et al.  Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond , 2023, ACM Trans. Knowl. Discov. Data.

[3]  Zixiang Ding,et al.  Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study , 2023, ArXiv.

[4]  Chunyuan Li,et al.  Instruction Tuning with GPT-4 , 2023, ArXiv.

[5]  Xuanjing Huang,et al.  A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models , 2023, ArXiv.

[6]  Henrique Pondé de Oliveira Pinto,et al.  GPT-4 Technical Report , 2023, 2303.08774.

[7]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[8]  Juhua Liu,et al.  Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT , 2023, ArXiv.

[9]  Dan Su,et al.  A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity , 2023, IJCNLP.

[10]  Andrew M. Dai,et al.  Scaling Instruction-Finetuned Language Models , 2022, ArXiv.

[11]  Qing-jie Zeng,et al.  A Survey in Automatic Irony Processing: Linguistic, Cognitive, and Multi-X Perspectives , 2022, COLING.

[12]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[13]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[14]  Lidong Bing,et al.  A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges , 2022, IEEE Transactions on Knowledge and Data Engineering.

[15]  Qi Zhang,et al.  Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training , 2021, EMNLP.

[16]  Lidong Bing,et al.  Aspect Sentiment Quad Prediction as Paraphrase Generation , 2021, EMNLP.

[17]  Quoc V. Le,et al.  Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.

[18]  Zhou Yu,et al.  Towards Emotional Support Dialog Systems , 2021, ACL.

[19]  Douwe Kiela,et al.  True Few-Shot Learning with Language Models , 2021, NeurIPS.

[20]  S. Riedel,et al.  Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity , 2021, ACL.

[21]  Leonardo Neves,et al.  TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification , 2020, FINDINGS.

[22]  Noah A. Smith,et al.  The Multilingual Amazon Reviews Corpus , 2020, EMNLP.

[23]  Lu Xu,et al.  Position-Aware Tagging for Aspect Sentiment Triplet Extraction , 2020, EMNLP.

[24]  Rada Mihalcea,et al.  Beneath the Tip of the Iceberg: Current Challenges and New Directions in Sentiment Analysis Research , 2020, IEEE Transactions on Affective Computing.

[25]  Dinesh Kumar Vishwakarma,et al.  Sentiment analysis using deep learning architectures: a review , 2019, Artificial Intelligence Review.

[26]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[27]  Paolo Rosso,et al.  SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[28]  Preslav Nakov,et al.  SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval) , 2019, *SEMEVAL.

[29]  Y-Lan Boureau,et al.  Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset , 2018, ACL.

[30]  Matthias Hagen,et al.  Categorizing Comparative Sentences , 2018, ArgMining@ACL.

[31]  Weitong Chen,et al.  A survey of sentiment analysis in social media , 2018, Knowledge and Information Systems.

[32]  Saif Mohammad,et al.  SemEval-2018 Task 1: Affect in Tweets , 2018, *SEMEVAL.

[33]  Véronique Hoste,et al.  SemEval-2018 Task 3: Irony Detection in English Tweets , 2018, *SEMEVAL.

[34]  Samuel R. Bowman,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[35]  Jon Rokne,et al.  Emotion detection from text and speech: a survey , 2018, Social Network Analysis and Mining.

[36]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[37]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[38]  Fabio Crestani,et al.  Comparative opinion mining: A review , 2017, J. Assoc. Inf. Sci. Technol..

[39]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[40]  Haris Papageorgiou,et al.  SemEval-2016 Task 5: Aspect Based Sentiment Analysis , 2016, *SEMEVAL.

[41]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[42]  Suresh Manandhar,et al.  SemEval-2015 Task 12: Aspect Based Sentiment Analysis , 2015, *SEMEVAL.

[43]  Suresh Manandhar,et al.  SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[44]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[45]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[46]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[47]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[48]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[49]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[50]  Vinh Q. Tran,et al.  Unifying Language Learning Paradigms , 2022, ArXiv.

[51]  Rui Xia,et al.  Aspect-Category-Opinion-Sentiment Quadruple Extraction with Implicit Aspects and Opinions , 2021, ACL.

[52]  Dilip Kumar Sharma,et al.  A Review on Offensive Language Detection , 2020 .

[53]  Preslav Nakov,et al.  SemEval-2017 Task 4: Sentiment Analysis in Twitter , 2017, *SEMEVAL.