Can LLMs like GPT-4 outperform traditional AI tools in dementia diagnosis? Maybe, but not today

Recent investigations show that large language models (LLMs), specifically GPT-4, not only have remarkable capabilities in common Natural Language Processing (NLP) tasks but also exhibit human-level performance on various professional and academic benchmarks. However, whether GPT-4 can be directly used in practical applications and replace traditional artificial intelligence (AI) tools in specialized domains requires further experimental validation. In this paper, we explore the potential of LLMs such as GPT-4 to outperform traditional AI tools in dementia diagnosis. Comprehensive comparisons between GPT-4 and traditional AI tools are conducted to examine their diagnostic accuracy in a clinical setting. Experimental results on two real clinical datasets show that, although LLMs like GPT-4 demonstrate potential for future advancements in dementia diagnosis, they currently do not surpass the performance of traditional AI tools. The interpretability and faithfulness of GPT-4 are also evaluated by comparison with real doctors. We discuss the limitations of GPT-4 in its current state and propose future research directions to enhance GPT-4 in dementia diagnosis.

[1]  Guoxin Ni,et al.  Will ChatGPT/GPT-4 be a Lighthouse to Guide Spinal Surgeons? , 2023, Annals of Biomedical Engineering.

[2]  J. Mendling,et al.  Large Language Models for Business Process Management: Opportunities and Challenges , 2023, BPM.

[3]  Libby Hemphill,et al.  A Bibliometric Review of Large Language Models Research from 2017 to 2023 , 2023, ArXiv.

[4]  Dragomir R. Radev,et al.  Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations , 2023, ArXiv.

[5]  Sébastien Bubeck,et al.  Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. , 2023, The New England journal of medicine.

[6]  Marco Tulio Ribeiro,et al.  Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.

[7]  E. Horvitz,et al.  Capabilities of GPT-4 on Medical Challenge Problems , 2023, ArXiv.

[8]  P. Schoenegger,et al.  "Correct answers"from the psychology of artificial intelligence , 2023, 2302.07267.

[9]  Tiffany H. Kung,et al.  Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models , 2022, medRxiv.

[10]  Zhuo Wang,et al.  Learning Cognitive-Test-Based Interpretable Rules for Prediction and Early Diagnosis of Dementia Using Neural Networks. , 2022, Journal of Alzheimer's disease : JAD.

[11]  D. Lei,et al.  Random Forest Model in the Diagnosis of Dementia Patients with Normal Mini-Mental State Examination Scores , 2022, Journal of personalized medicine.

[12]  Jianyong Wang,et al.  Scalable Rule-Based Representation Learning for Interpretable Classification , 2021, NeurIPS.

[13]  K. Blennow,et al.  Prediction of future Alzheimer’s disease dementia using plasma phospho-tau combined with other accessible measures , 2021, Nature Medicine.

[14]  Di Jin,et al.  What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams , 2020, Applied Sciences.

[15]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[16]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[17]  William W. Cohen,et al.  PubMedQA: A Dataset for Biomedical Research Question Answering , 2019, EMNLP.

[18]  P. Barbarino,et al.  THE STATE OF THE ART OF DEMENTIA RESEARCH: NEW FRONTIERS , 2019, Alzheimer's & Dementia.

[19]  G Sathish,et al.  Data Wrangling and Data Leakage in Machine Learning for Healthcare , 2018 .

[20]  C. Ferri,et al.  World Alzheimer Report 2011 : The benefits of early diagnosis and intervention , 2018 .

[21]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[22]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[23]  Bob Woods,et al.  Nonpharmacological Therapies in Alzheimer’s Disease: A Systematic Review of Efficacy , 2010, Dementia and Geriatric Cognitive Disorders.

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  C. Dolea,et al.  World Health Organization , 1949, International Organization.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Brandon M. Greenwell,et al.  Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.

[28]  Kristin L. Sainani,et al.  Logistic Regression , 2014, PM & R : the journal of injury, function, and rehabilitation.

[29]  P. Rabins,et al.  Dementia , 2008, Annals of Internal Medicine.