ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification

ChatGPT has shown strong capabilities in natural language generation tasks, which naturally leads researchers to explore where its abilities end. In this paper, we examine whether ChatGPT can be used for zero-shot text classification, more specifically, automatic genre identification. We compare ChatGPT with a multilingual XLM-RoBERTa language model that was fine-tuned on datasets, manually annotated with genres. The models are compared on test sets in two languages: English and Slovenian. Results show that ChatGPT outperforms the fine-tuned model when applied to the dataset which was not seen before by either of the models. Even when applied on Slovenian language as an under-resourced language, ChatGPT's performance is no worse than when applied to English. However, if the model is fully prompted in Slovenian, the performance drops significantly, showing the current limitations of ChatGPT usage on smaller languages. The presented results lead us to questioning whether this is the beginning of an end of laborious manual annotation campaigns even for smaller languages, such as Slovenian.

[1]  Juhua Liu,et al.  Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT , 2023, ArXiv.

[2]  Hany Hassan Awadalla,et al.  How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation , 2023, ArXiv.

[3]  Haewoon Kwak,et al.  Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech , 2023, WWW.

[4]  Michihiro Yasunaga,et al.  Is ChatGPT a General-Purpose Natural Language Processing Task Solver? , 2023, EMNLP.

[5]  Ziyuan Wang,et al.  How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection , 2023, ArXiv.

[6]  Bowen Zhang,et al.  How would Stance Detection Techniques Evolve after the Launch of ChatGPT? , 2022, ArXiv.

[7]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[8]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[9]  Nikola Ljubesic,et al.  The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild , 2022, LREC.

[10]  Philipp Koehn,et al.  Findings of the 2022 Conference on Machine Translation (WMT22) , 2022, WMT.

[11]  Barbara Plank,et al.  Genre as Weak Supervision for Cross-lingual Dependency Parsing , 2021, EMNLP.

[12]  Veronika Laippala,et al.  Beyond the English Web: Zero-Shot Cross-Lingual and Lightweight Monolingual Classification of Registers , 2021, EACL.

[13]  S. Sharoff Genre Annotation for the Web: text-external and text-internal perspectives , 2021 .

[14]  Veronika Laippala,et al.  Multilingual and Zero-Shot is Closing in on Monolingual Web Register Classification , 2021, NODALIDA.

[15]  Vít Suchomel,et al.  Genre Annotation of Web Corpora: Scheme and Issues , 2020 .

[16]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[17]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Christof Monz,et al.  Evaluation of Machine Translation Performance Across Multiple Genres and Languages , 2018, LREC.

[20]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[21]  Serge Sharoff,et al.  Functional text dimensions for the annotation of web corpora , 2018 .

[22]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[23]  Douglas Biber,et al.  Developing a bottom‐up, user‐based method of web register classification , 2015, J. Assoc. Inf. Sci. Technol..

[24]  Tomaž Erjavec,et al.  The slWaC 2 . 0 Corpus of the Slovene Web , 2014 .

[25]  Katja Markert,et al.  The Web Library of Babel: evaluating genre collections , 2010, LREC.

[26]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[27]  Stefan Evert,et al.  Is Part-of-Speech Tagging a Solved Task? An Evaluation of POS Taggers for the German Web as Corpus , 2009 .

[28]  Jade Goldstein Stewart,et al.  Genre Oriented Summarization , 2009 .

[29]  Benno Stein,et al.  Genre Classification of Web Pages , 2004, KI.

[30]  W. Orlikowski,et al.  Genre Repertoire: The Structuring of Communicative Practices in Organizations , 1994 .

[31]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .