AI to Train AI: Using ChatGPT to Improve the Accuracy of a Therapeutic Dialogue System

In this work, we present the use of one artificial intelligence (AI) application (ChatGPT) to train another AI-based application. As the latter one, we show a dialogue system named Terabot, which was used in the therapy of psychiatric patients. Our study was motivated by the fact that for such a domain-specific system, it was difficult to acquire large real-life data samples to increase the training database: this would require recruiting more patients, which is both time-consuming and costly. To address this gap, we have employed a neural large language model: ChatGPT version 3.5, to generate data solely for training our dialogue system. During initial experiments, we identified intents that were most often misrecognized. Next, we fed ChatGPT with a series of prompts, which triggered the language model to generate numerous additional training entries, e.g., alternatives to the phrases that had been collected during initial experiments with healthy users. This way, we have enlarged the training dataset by 112%. In our case study, for testing, we used 2802 speech recordings originating from 32 psychiatric patients. As an evaluation metric, we used the accuracy of intent recognition. The speech samples were converted into text using automatic speech recognition (ASR). The analysis showed that the patients’ speech challenged the ASR module significantly, resulting in deteriorated speech recognition and, consequently, low accuracy of intent recognition. However, thanks to the augmentation of the training data with ChatGPT-generated data, the intent recognition accuracy increased by 13% relatively, reaching 86% in total. We also emulated the case of an error-free ASR and showed the impact of ASR misrecognitions on the intent recognition accuracy. Our study showcased the potential of using generative language models to develop other AI-based tools, such as dialogue systems.

[1]  Lirong Yin,et al.  Developing Multi-Labelled Corpus of Twitter Short Texts: A Semi-Automatic Method , 2023, Syst..

[2]  Lirong Yin,et al.  Emotion classification for short texts: an improved multi-label method , 2023, Humanities and Social Sciences Communications.

[3]  F. Fischer,et al.  ChatGPT for good? On opportunities and challenges of large language models for education , 2023, Learning and Individual Differences.

[4]  M. Balas,et al.  Conversational AI Models for ophthalmic diagnosis: Comparison of ChatGPT and the Isabel Pro Differential Diagnosis Generator , 2023, JFO Open Ophthalmology.

[5]  Dilek Z. Hakkani-Tür,et al.  PLACES: Prompting Language Models for Social Conversation Synthesis , 2023, FINDINGS.

[6]  Pascale Fung,et al.  Survey of Hallucination in Natural Language Generation , 2022, ACM Comput. Surv..

[7]  A. Janicki,et al.  Text-Based Emotion Recognition in English and Polish for Therapeutic Chatbot , 2021, Applied Sciences.

[8]  Noah A. Smith,et al.  Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand? , 2021, Transactions of the Association for Computational Linguistics.

[9]  Emily M. Bender,et al.  On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.

[10]  Anran Jiao,et al.  An Intelligent Chatbot System Based on Entity Extraction Using RASA NLU and Neural Network , 2020, Journal of Physics: Conference Series.

[11]  J. Wciórka,et al.  Therapy based on avatar-therapist synergy for patients with chronic auditory hallucinations: A pilot study , 2019, Schizophrenia Research.

[12]  R. Emsley,et al.  AVATAR therapy for auditory verbal hallucinations in people with psychosis: a single-blind, randomised controlled trial , 2018, The lancet. Psychiatry.

[13]  Antonio Fernández-Caballero,et al.  Human-Avatar Symbiosis for the Treatment of Auditory Verbal Hallucinations in Schizophrenia through Virtual/Augmented Reality and Brain-Computer Interfaces , 2017, Front. Neuroinform..

[14]  K. Fitzpatrick,et al.  Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial , 2017, JMIR mental health.

[15]  Wojciech Stokowiec,et al.  LanguageCrawl: a generic tool for building language models upon common Crawl , 2016, Language Resources and Evaluation.

[16]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.