Domain-adapted large language models for classifying nuclear medicine reports

With the growing use of transformer-based language models in medicine, it is unclear how well these models generalize to nuclear medicine which has domain-specific vocabulary and unique reporting styles. In this study, we evaluated the value of domain adaptation in nuclear medicine by adapting language models for the purpose of 5-point Deauville score prediction based on clinical 18F-fluorodeoxyglucose (FDG) PET/CT reports. We retrospectively retrieved 4542 text reports and 1664 images for FDG PET/CT lymphoma exams from 2008-2018 in our clinical imaging database. Deauville scores were removed from the reports and then the remaining text in the reports was used as the model input. Multiple general-purpose transformer language models were used to classify the reports into Deauville scores 1-5. We then adapted the models to the nuclear medicine domain using masked language modeling and assessed its impact on classification performance. The language models were compared against vision models, a multimodal vision language model, and a nuclear medicine physician with seven-fold Monte Carlo cross validation, reported are the mean and standard deviations. Domain adaption improved all language models. For example, BERT improved from 61.3% five-class accuracy to 65.7% following domain adaptation. The best performing model (domain-adapted RoBERTa) achieved a five-class accuracy of 77.4%, which was better than the physician's performance (66%), the best vision model's performance (48.1), and was similar to the multimodal model's performance (77.2). Domain adaptation improved the performance of large language models in interpreting nuclear medicine text reports.

[1]  H. Kauczor,et al.  Deep Learning-based Assessment of Oncologic Outcomes from Natural Language Processing of Structured Radiology Reports. , 2022, Radiology. Artificial intelligence.

[2]  Ali S. Tejani,et al.  Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets. , 2022, Radiology. Artificial intelligence.

[3]  I. Buvat,et al.  18F-FDG PET Maximum-Intensity Projections and Artificial Intelligence: A Win-Win Combination to Easily Measure Prognostic Biomarkers in DLBCL Patients , 2022, The Journal of Nuclear Medicine.

[4]  Chun-Nan Hsu,et al.  RadBERT: Adapting Transformer-based Language Models to Radiology. , 2022, Radiology. Artificial intelligence.

[5]  Gunvant R. Chaudhari,et al.  Application of a Domain-specific BERT for Detection of Speech Recognition Errors in Radiology Reports. , 2022, Radiology. Artificial intelligence.

[6]  Chun-Nan Hsu,et al.  Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation , 2021, EMNLP.

[7]  Lihi Zelnik-Manor,et al.  ImageNet-21K Pretraining for the Masses , 2021, NeurIPS Datasets and Benchmarks.

[8]  U Deva Priyakumar,et al.  MMBERT: Multimodal BERT Pretraining for Improved Medical VQA , 2021, 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI).

[9]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[10]  Andrew Y. Ng,et al.  CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT , 2020, EMNLP.

[11]  Eli Konen,et al.  Deep Learning for Natural Language Processing in Radiology-Fundamentals and a Systematic Review. , 2020, Journal of the American College of Radiology : JACR.

[12]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[13]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[14]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[15]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.