Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health

ChatGPT has drawn considerable attention from both the general public and domain experts with its remarkable text generation capabilities. This has subsequently led to the emergence of diverse applications in the field of biomedicine and health. In this work, we examine the diverse applications of large language models (LLMs), such as ChatGPT, in biomedicine and health. Specifically we explore the areas of biomedical information retrieval, question answering, medical text summarization, information extraction, and medical education, and investigate whether LLMs possess the transformative power to revolutionize these tasks or whether the distinct complexities of biomedical domain presents unique challenges. Following an extensive literature survey, we find that significant advances have been made in the field of text generation tasks, surpassing the previous state-of-the-art methods. For other applications, the advances have been modest. Overall, LLMs have not yet revolutionized the biomedicine, but recent rapid progress indicates that such methods hold great potential to provide valuable means for accelerating discovery and improving health. We also find that the use of LLMs, like ChatGPT, in the fields of biomedicine and health entails various risks and challenges, including fabricated information in its generated responses, as well as legal and privacy concerns associated with sensitive patient data. We believe this first-of-its-kind survey can provide a comprehensive overview to biomedical researchers and healthcare practitioners on the opportunities and challenges associated with using ChatGPT and other LLMs for transforming biomedicine and health.

[1]  Zhiyong Lu,et al.  Retrieve, Summarize, and Verify: How will ChatGPT impact information seeking from the medical literature? , 2023, Journal of the American Society of Nephrology : JASN.

[2]  Lucy Lu Wang,et al.  Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations , 2023, ACL.

[3]  Justin F. Rousseau,et al.  AI-generated text may have a role in evidence-based medicine , 2023, Nature Medicine.

[4]  R. G. Krishnan,et al.  Clinical Camel: An Open-Source Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding , 2023, ArXiv.

[5]  Vivek Natarajan,et al.  Towards Expert-Level Medical Question Answering with Large Language Models , 2023, ArXiv.

[6]  Rui Zhang,et al.  Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations , 2023, ArXiv.

[7]  M. Karabacak,et al.  Embracing Large Language Models for Medical Applications: Opportunities and Challenges , 2023, Cureus.

[8]  Weidi Xie,et al.  PMC-LLaMA: Further Finetuning LLaMA on Medical Papers , 2023, ArXiv.

[9]  Haoming Jiang,et al.  Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond , 2023, ACM Trans. Knowl. Discov. Data.

[10]  Frederick M. Howard,et al.  Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers , 2023, npj Digit. Medicine.

[11]  Carly Norris Large Language Models Like ChatGPT in ABME: Author Guidelines , 2023, Annals of Biomedical Engineering.

[12]  Justin F. Rousseau,et al.  Evaluating large language models on medical evidence summarization , 2023, medRxiv.

[13]  Z. Niu,et al.  A Comprehensive Benchmark Study on Biomedical Text Generation and Mining with ChatGPT , 2023, bioRxiv.

[14]  Nelson F. Liu,et al.  Evaluating Verifiability in Generative Search Engines , 2023, ArXiv.

[15]  Qingyu Chen,et al.  GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information , 2023, Bioinform..

[16]  Maosong Sun,et al.  Tool Learning with Foundation Models , 2023, ArXiv.

[17]  Lei Guo,et al.  ImpressionGPT: An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT , 2023, IEEE Transactions on Artificial Intelligence.

[18]  D. Truhn,et al.  MedAlpaca - An Open-Source Collection of Medical Conversational AI Models and Training Data , 2023, ArXiv.

[19]  Ting Liu,et al.  HuaTuo: Tuning LLaMA Model with Chinese Medical Knowledge , 2023, ArXiv.

[20]  K. Katanoda,et al.  Should We Acknowledge ChatGPT as an Author? , 2023, Journal of epidemiology.

[21]  Philip Hawke,et al.  Can ChatGPT Be Considered an Author of a Medical Article? , 2023, Journal of epidemiology.

[22]  C. Mungall,et al.  Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning , 2023, Bioinform..

[23]  Honglin Xiong,et al.  DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task , 2023, ArXiv.

[24]  Wayne Xin Zhao,et al.  A Survey of Large Language Models , 2023, ArXiv.

[25]  J. Egger,et al.  ChatGPT in Healthcare: A Taxonomy and Systematic Review , 2023, medRxiv.

[26]  E. Horvitz,et al.  Capabilities of GPT-4 on Medical Challenge Problems , 2023, ArXiv.

[27]  Wenpin Hou,et al.  GeneTuring tests GPT models in genomics , 2023, bioRxiv.

[28]  Denis Jered McInerney,et al.  Automatically Summarizing Evidence from Clinical Trials: A Prototype Highlighting Current Challenges , 2023, EACL.

[29]  Malik Sallam,et al.  ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns , 2023, Healthcare.

[30]  J. Schoones,et al.  ChatGPT as an author of academic papers is wrong and highlights the concepts of accountability and contributorship. , 2023, Nurse education in practice.

[31]  Michael Moor,et al.  Almanac: Retrieval-Augmented Language Models for Clinical Medicine , 2023, Research square.

[32]  G. Eysenbach The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers , 2023, JMIR medical education.

[33]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[34]  Ju Yeon Lee Can an artificial intelligence chatbot be the author of a scholarly article? , 2023, Journal of educational evaluation for health professions.

[35]  Xi Ouyang,et al.  ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models , 2023, ArXiv.

[36]  M. Sajjad,et al.  ChatGPT - Reshaping medical education and clinical management , 2023, Pakistan journal of medical sciences.

[37]  B. Koopman,et al.  Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search? , 2023, SIGIR.

[38]  Kyle Lam,et al.  ChatGPT: the future of discharge summaries? , 2023, The Lancet. Digital health.

[39]  Chris Stokel-Walker ChatGPT listed as author on research papers: many scientists disapprove , 2023, Nature.

[40]  M. Ingrisch,et al.  ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports , 2022, European radiology.

[41]  Hyung Won Chung,et al.  Large language models encode clinical knowledge , 2022, Nature.

[42]  A. Zhavoronkov Rapamycin in the context of Pascal’s Wager: generative pre-trained transformer perspective , 2022, Oncoscience.

[43]  Tiffany H. Kung,et al.  Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models , 2022, medRxiv.

[44]  G. Paliouras,et al.  BioASQ-QA: A manually curated corpus for Biomedical Question Answering , 2022, bioRxiv.

[45]  S. O'Connor,et al.  Open artificial intelligence platforms in nursing education: Tools for academic progress or abuse? , 2022, Nurse education in practice.

[46]  Zhiyong Lu,et al.  AIONER: All-in-one scheme-based biomedical named entity recognition using deep learning , 2022, Bioinform..

[47]  Jamie Callan,et al.  PAL: Program-aided Language Models , 2022, ICML.

[48]  Guillem Cucurull,et al.  Galactica: A Large Language Model for Science , 2022, ArXiv.

[49]  Andrew M. Dai,et al.  Scaling Instruction-Finetuned Language Models , 2022, ArXiv.

[50]  Christopher D. Manning,et al.  Deep Bidirectional Language-Knowledge Graph Pretraining , 2022, NeurIPS.

[51]  Tsung-Hui Chang,et al.  Improving Radiology Summarization with Radiograph and Anatomy Prompts , 2022, ACL.

[52]  Shenmin Zhang,et al.  BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining , 2022, Briefings Bioinform..

[53]  R. Vliegenthart,et al.  Possible Bias in Supervised Deep Learning Algorithms for CT Lung Nodule Detection and Classification , 2022, Cancers.

[54]  O. Winther,et al.  Can large language models reason about medical questions? , 2022, Patterns.

[55]  J. Dean,et al.  Emergent Abilities of Large Language Models , 2022, Trans. Mach. Learn. Res..

[56]  D. Sontag,et al.  Large language models are few-shot clinical information extractors , 2022, EMNLP.

[57]  K. Chang,et al.  Are Large Pre-Trained Language Models Leaking Your Personal Information? , 2022, EMNLP.

[58]  B. Koopman,et al.  From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search , 2022, SIGIR.

[59]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[60]  Ankit Pal,et al.  MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering , 2022, CHIL.

[61]  Bernal Jimenez Gutierrez,et al.  Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again , 2022, EMNLP.

[62]  S. Savarese,et al.  Long Document Summarization with Top-down and Bottom-up Inference , 2022, FINDINGS.

[63]  B. Hameed,et al.  Legal and Ethical Consideration in Artificial Intelligence in Healthcare: Who Takes Responsibility? , 2022, Frontiers in Surgery.

[64]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[65]  Pascale Fung,et al.  Survey of Hallucination in Natural Language Generation , 2022, ACM Comput. Surv..

[66]  Colin B. Compas,et al.  GatorTron: A Large Clinical Language Model to Unlock Patient Information from Unstructured Electronic Health Records , 2022, medRxiv.

[67]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[68]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[69]  Carlotta Orsenigo,et al.  ELECTRAMed: a new pre-trained language representation model for biomedical NLP , 2021, ArXiv.

[70]  Xiaozhong Liu,et al.  Biomedical Question Answering: A Survey of Approaches and Challenges , 2021, ACM Comput. Surv..

[71]  Miles Brundage,et al.  Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models , 2021, ArXiv.

[72]  James Zou,et al.  Persistent Anti-Muslim Bias in Large Language Models , 2021, AIES.

[73]  Charles Foster,et al.  The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[74]  Yang Zhang,et al.  Bio-Megatron: Larger Biomedical Domain Language Model , 2020, EMNLP.

[75]  Di Jin,et al.  What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams , 2020, Applied Sciences.

[76]  Jianfeng Gao,et al.  Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..

[77]  David S Jones,et al.  Hidden in Plain Sight - Reconsidering the Use of Race Correction in Clinical Algorithms. , 2020, The New England journal of medicine.

[78]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[79]  Andrew Y. Ng,et al.  CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT , 2020, EMNLP.

[80]  Byron C. Wallace,et al.  Query-Focused EHR Summarization to Aid Imaging Diagnosis , 2020, MLHC.

[81]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[82]  Dirk Hovy,et al.  Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview , 2019, ACL.

[83]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[84]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[85]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[86]  William W. Cohen,et al.  PubMedQA: A Dataset for Biomedical Research Question Answering , 2019, EMNLP.

[87]  Mark Sharp,et al.  Bridging the Gap Between Consumers' Medication Questions and Trusted Answers , 2019, MedInfo.

[88]  Aidong Zhang,et al.  A survey on literature based discovery approaches in biomedical domain , 2019, J. Biomed. Informatics.

[89]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[90]  Roger G. Mark,et al.  MIMIC-CXR: A large publicly available database of labeled chest radiographs , 2019, ArXiv.

[91]  Leif Azzopardi,et al.  CLEF 2018 Technologically Assisted Reviews in Empirical Medicine Overview , 2018, CLEF.

[92]  Franck Dernoncourt,et al.  A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents , 2018, NAACL.

[93]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[94]  Yifan Peng,et al.  Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task , 2016, Database J. Biol. Databases Curation.

[95]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.

[96]  Noémie Elhadad,et al.  Automated methods for the summarization of electronic health records , 2015, J. Am. Medical Informatics Assoc..

[97]  Núria Queralt-Rosinach,et al.  Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research , 2014, BMC Bioinformatics.

[98]  Zhiyong Lu,et al.  NCBI disease corpus: A resource for disease name recognition and concept normalization , 2014, J. Biomed. Informatics.

[99]  Paloma Martínez,et al.  The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions , 2013, J. Biomed. Informatics.

[100]  Dragomir R. Radev,et al.  Generating Extractive Summaries of Scientific Paradigms , 2013, J. Artif. Intell. Res..

[101]  Adam Wright,et al.  Summarization of clinical information: A conceptual model , 2011, J. Biomed. Informatics.

[102]  Zhiyong Lu,et al.  PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.

[103]  Richard Tzong-Han Tsai,et al.  Overview of BioCreative II gene mention recognition , 2008, Genome Biology.

[104]  Jimmy J. Lin,et al.  PubMed related articles: a probabilistic topic-based model for content similarity , 2007, BMC Bioinformatics.

[105]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[106]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[107]  You Zhang,et al.  ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge , 2023, ArXiv.

[108]  Vinaytosh Mishra Large Language Models in Medical Education and Quality Concerns , 2023, Journal of Quality in Health Care & Economics.

[109]  Dongyan Zhao,et al.  Capturing Relations between Scientific Papers: An Abstractive Model for Related Work Section Generation , 2021, ACL.

[110]  Vijay K. Shanker,et al.  BioM-Transformers: Building Large Biomedical Language Models with BERT, ALBERT and ELECTRA , 2021, BIONLP.

[111]  Daniel S. Weld,et al.  S2ORC: The Semantic Scholar Open Research Corpus , 2020, ACL.

[112]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[113]  Leif Azzopardi,et al.  CLEF 2019 Technology Assisted Reviews in Empirical Medicine Overview , 2019, CLEF.

[114]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[115]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[116]  Eugene Agichtein,et al.  Overview of the Medical Question Answering Task at TREC 2017 LiveQA , 2017, TREC.

[117]  Anália Lourenço,et al.  Overview of the BioCreative VI chemical-protein interaction Track , 2017 .