Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events

Large language models (LLMs), such as GPT-4, have demonstrated remarkable capabilities across a wide range of tasks, including health applications. In this paper, we study how LLMs can be used to scale biomedical knowledge curation. We find that while LLMs already possess decent competency in structuring biomedical text, by distillation into a task-specific student model through self-supervised learning, substantial gains can be attained over out-of-box LLMs, with additional advantages such as cost, efficiency, and white-box model access. We conduct a case study on adverse drug event (ADE) extraction, which is an important area for improving care. On standard ADE extraction evaluation, a GPT-3.5 distilled PubMedBERT model attained comparable accuracy as supervised state-of-the-art models without using any labeled data. Despite being over 1,000 times smaller, the distilled model outperformed its teacher GPT-3.5 by over 6 absolute points in F1 and GPT-4 by over 5 absolute points. Ablation studies on distillation model choice (e.g., PubMedBERT vs BioGPT) and ADE extraction architecture shed light on best practice for biomedical knowledge extraction. Similar gains were attained by distillation for other standard biomedical knowledge extraction tasks such as gene-disease associations and protected health information, further illustrating the promise of this approach.

[1]  Marco Tulio Ribeiro,et al.  Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.

[2]  Shenmin Zhang,et al.  BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining , 2022, Briefings Bioinform..

[3]  D. Sontag,et al.  Large language models are few-shot clinical information extractors , 2022, EMNLP.

[4]  Stephen H. Bach,et al.  Language Models in the Loop: Incorporating Prompting into Weak Supervision , 2022, ACM / IMS Journal of Data Science.

[5]  Bernal Jimenez Gutierrez,et al.  Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again , 2022, EMNLP.

[6]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[7]  Hao Cheng,et al.  Fine-tuning large neural language models for biomedical natural language processing , 2021, Patterns.

[8]  M. Samwald,et al.  GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain , 2021, ArXiv.

[9]  Jue Wang,et al.  Two Are Better than One: Joint Entity and Relation Extraction with Table-Sequence Encoders , 2020, EMNLP.

[10]  Jianfeng Gao,et al.  Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..

[11]  Long Chen,et al.  Extracting medications and associated adverse drug events using a natural language processing system combining knowledge base and deep learning , 2019, J. Am. Medical Informatics Assoc..

[12]  A. Ulges,et al.  Span-based Joint Entity and Relation Extraction with Transformer Pre-training , 2019, ECAI.

[13]  Alexey Romanov,et al.  Lessons from Natural Language Inference in the Clinical Domain , 2018, EMNLP.

[14]  Juliane Fluck,et al.  Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports , 2012, J. Biomed. Informatics.

[15]  C. Marano,et al.  To err is human. Building a safer health system , 2005 .

[16]  D. Classen,et al.  Adverse drug events in hospitalized patients. Excess length of stay, extra costs, and attributable mortality. , 1997, JAMA.

[17]  Debarshi Kumar Sanyal,et al.  Joint Entity and Relation Extraction from Scientific Documents: Role of Linguistic Information and Entity Types , 2021, EEKE@JCDL.

[18]  Percy Liang,et al.  Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.

[19]  Roberto Navigli,et al.  REBEL: Relation Extraction By End-to-end Language generation , 2021, EMNLP.

[20]  Michele Filannino,et al.  2018 N2c2 Shared Task on Adverse Drug Events and Medication Extraction in Electronic Health Records , 2020, J. Am. Medical Informatics Assoc..

[21]  Goran Nenadic,et al.  Building and Evaluating Resources for Biomedical Text Mining , 2008 .

[22]  P. Maurette,et al.  [To err is human: building a safer health system]. , 2002, Annales francaises d'anesthesie et de reanimation.