Medically Aware GPT-3 as a Data Generator for Medical Dialogue Summarization

In medical dialogue summarization, summaries must be coherent and must capture all the medically relevant information in the dialogue. However, learning effective models for summarization require large amounts of labeled data which is especially hard to obtain. We present an algorithm to create synthetic training data with an explicit focus on capturing medically relevant information. We utilize GPT-3 as the backbone of our algorithm and scale 210 human labeled examples to yield results comparable to using 6400 human labeled examples (~30x) leveraging low-shot learning and an ensemble method. In detailed experiments, we show that this approach produces high quality training data that can further be combined with human labeled data to get summaries that are strongly preferable to those produced by models trained on human data alone both in terms of medical accuracy and coherency.

[1]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[2]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[3]  O. P. Kurganova,et al.  Dr. , 2019, D37. TOPICS IN GLOBAL HEALTH SERVICES RESEARCH.

[4]  Mary Williamson,et al.  Recipes for Building an Open-Domain Chatbot , 2020, EACL.

[5]  Jeffrey P. Bigham,et al.  Generating SOAP Notes from Doctor-Patient Conversations , 2020, ArXiv.

[6]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[7]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[8]  Anne Kim,et al.  Extractive Summarization of EHR Discharge Notes , 2018, ArXiv.

[9]  Christopher D. Manning,et al.  Learning to Summarize Radiology Findings , 2018, Louhi@EMNLP.

[10]  Hongfang Liu,et al.  Development of Clinical Concept Extraction Applications: A Methodology Review , 2019, ArXiv.

[11]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[12]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[13]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[14]  Christine A. Sinsky,et al.  Relationship Between Clerical Burden and Characteristics of the Electronic Environment With Physician Burnout and Professional Satisfaction. , 2016, Mayo Clinic proceedings.

[15]  Yao Zhao,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.

[16]  O. Nov,et al.  COVID-19 transforms health care through telemedicine: Evidence from the field , 2020, J. Am. Medical Informatics Assoc..

[17]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[18]  Weizhu Chen,et al.  What Makes Good In-Context Examples for GPT-3? , 2021, DEELIO.

[19]  Nancy F. Chen,et al.  Topic-Aware Pointer-Generator Networks for Summarizing Spoken Conversations , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[20]  Jieping Ye,et al.  Automatic Dialogue Summary Generation for Customer Service , 2019, KDD.

[21]  Richard Socher,et al.  Evaluating the Factual Consistency of Abstractive Text Summarization , 2019, EMNLP.

[22]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[23]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[24]  Xavier Amatriain,et al.  Dr. Summarize: Global Summarization of Medical Dialogue by Exploiting Local Structures. , 2020, FINDINGS.

[25]  Yun-Nung Chen,et al.  Abstractive Dialogue Summarization with Sentence-Gated Modeling Optimized by Dialogue Acts , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[26]  Jeffrey P. Bigham,et al.  Extracting Structured Data from Physician-Patient Conversations By Predicting Noteworthy Utterances , 2020, Explainable AI in Healthcare and Medicine.

[27]  Ming Zhou,et al.  ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training , 2020, FINDINGS.