Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks

The most recent large language models(LLMs) such as ChatGPT and GPT-4 have shown exceptional capabilities of generalist models, achieving state-of-the-art performance on a wide range of NLP tasks with little or no adaptation. How effective are such models in the financial domain? Understanding this basic question would have a significant impact on many downstream financial analytical tasks. In this paper, we conduct an empirical study and provide experimental evidences of their performance on a wide variety of financial text analytical problems, using eight benchmark datasets from five categories of tasks. We report both the strengths and limitations of the current models by comparing them to the state-of-the-art fine-tuned approaches and the recently released domain-specific pretrained models. We hope our study can help understand the capability of the existing models in the financial domain and facilitate further improvements.

[1]  Sameena Shah,et al.  REFinD: Relation Extraction Financial Dataset , 2023, SIGIR.

[2]  Anderson R. Avila,et al.  How Secure is Code Generated by ChatGPT? , 2023, ArXiv.

[3]  Yuexin Zhang,et al.  Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4 , 2023, ArXiv.

[4]  P. Kambadur,et al.  BloombergGPT: A Large Language Model for Finance , 2023, ArXiv.

[5]  Panos Kalnis,et al.  ChatGPT versus Traditional Question Answering for Knowledge Graphs: Current Status and Future Directions Towards Knowledge Graph Chatbots , 2023, ArXiv.

[6]  Dan Su,et al.  A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity , 2023, IJCNLP.

[7]  Michihiro Yasunaga,et al.  Is ChatGPT a General-Purpose Natural Language Processing Task Solver? , 2023, EMNLP.

[8]  Guillem Cucurull,et al.  Galactica: A Large Language Model for Science , 2022, ArXiv.

[9]  Raj Sanjay Shah,et al.  When FLUE Meets FLANG: Benchmarks and Large Pretrained Language Model for Financial Domain , 2022, EMNLP.

[10]  Zheng Xin Yong,et al.  What Language Model to Train if You Have One Million GPU Hours? , 2022, EMNLP.

[11]  Stephen W. Thomas,et al.  Learning Better Intent Representations for Financial Open Intent Classification , 2022, FINNLP.

[12]  William Yang Wang,et al.  ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering , 2022, EMNLP.

[13]  Shenmin Zhang,et al.  BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining , 2022, Briefings Bioinform..

[14]  Yuhuai Wu,et al.  Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.

[15]  Sameena Shah,et al.  FinQA: A Dataset of Numerical Reasoning over Financial Data , 2021, EMNLP.

[16]  Hiroyuki Shindo,et al.  LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention , 2020, EMNLP.

[17]  Ankur Sinha,et al.  Impact of News on the Commodity Market: Dataset and Results , 2020, Advances in Intelligent Systems and Computing.

[18]  Lubomir T. Chitkushev,et al.  Evaluation of Sentiment Analysis in Finance: From Lexicons to Transformers , 2020, IEEE Access.

[19]  Lucian Popa,et al.  Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition Using Visual Context , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[20]  Dogu Araci,et al.  FinBERT: Financial Sentiment Analysis with Pre-trained Language Models , 2019, ArXiv.

[21]  André Freitas,et al.  WWW'18 Open Challenge: Financial Opinion Mining and Question Answering , 2018, WWW.

[22]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[23]  Cynna Selvy U.S. Securities and Exchange Commission (SEC) , 2016 .

[24]  Pekka Korhonen,et al.  Good debt or bad debt: Detecting semantic orientations in economic texts , 2013, J. Assoc. Inf. Sci. Technol..

[25]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[26]  J. M. Kittross The measurement of meaning , 1959 .

[27]  Sameena Shah,et al.  TweetFinSent: A Dataset of Stock Sentiments on Twitter , 2022, FINNLP.

[28]  Timothy Baldwin,et al.  Domain Adaption of Named Entity Recognition to Support Credit Risk Assessment , 2015, ALTA.