CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

The extraction of labels from radiology text reports enables large-scale training of medical imaging models. Existing approaches to report labeling typically rely either on sophisticated feature engineering based on medical domain knowledge or manual annotations by experts. In this work, we introduce a BERT-based approach to medical image report labeling that exploits both the scale of available rule-based systems and the quality of expert annotations. We demonstrate superior performance of a biomedically pretrained BERT model first trained on annotations of a rule-based labeler and then finetuned on a small set of expert annotations augmented with automated backtranslation. We find that our final model, CheXbert, is able to outperform the previous best rules-based labeler with statistical significance, setting a new SOTA for report labeling on one of the largest datasets of chest x-rays.

[1]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[2]  Mary Williamson,et al.  Facebook AI’s WMT20 News Translation Task Submission , 2020, WMT.

[3]  Antonio Pertusa,et al.  PadChest: A large chest x-ray image dataset with multi-label annotated reports , 2019, Medical Image Anal..

[4]  P. Hinds,et al.  Automated Outcome Classification of Computed Tomography Imaging Reports for Pediatric Traumatic Brain Injury. , 2016, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[5]  Quoc V. Le,et al.  QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.

[6]  Ronald M. Summers,et al.  NegBio: a high-performance tool for negation and uncertainty detection in radiology reports , 2017, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[7]  Ping He,et al.  Fine-tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text , 2019, 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[8]  Zhiyong Lu,et al.  Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets , 2019, BioNLP@ACL.

[9]  David A. Wood,et al.  Automated Labelling using an Attention model for Radiology reports of MRI scans (ALARM) , 2020, MIDL.

[10]  Roger G. Mark,et al.  MIMIC-CXR: A large publicly available database of labeled chest radiographs , 2019, ArXiv.

[11]  Daniel L. Rubin,et al.  Cross-Modal Data Programming Enables Rapid Medical Machine Learning , 2019, Patterns.

[12]  Department of Computer Science,et al.  CheXpert++: Approximating the CheXpert labeler for Speed, Differentiability, and Probabilistic Output , 2020, MLHC.

[13]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[14]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[15]  Loes M. M. Braun,et al.  Natural Language Processing in Radiology: A Systematic Review. , 2016, Radiology.

[16]  Po-Hao Chen,et al.  Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports , 2018, Journal of Digital Imaging.

[17]  Selen Bozkurt,et al.  Automated Detection of Measurements and Their Descriptors in Radiology Reports Using a Hybrid Natural Language Processing Algorithm , 2019, Journal of Digital Imaging.

[18]  Hongfang Liu,et al.  Journal of Biomedical Informatics , 2022 .

[19]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[20]  Daniel L. Rubin,et al.  Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort , 2018, J. Biomed. Informatics.

[21]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22]  Daniel L. Rubin,et al.  Doubly Weak Supervision of Deep Learning Models for Head CT , 2019, MICCAI.

[23]  C. Langlotz,et al.  Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield. , 2017, AJR. American journal of roentgenology.

[24]  C. Langlotz,et al.  Deep Learning to Classify Radiology Free-Text Reports. , 2017, Radiology.

[25]  Christopher Ré,et al.  Medical device surveillance with electronic health records , 2019, npj Digital Medicine.

[26]  Robert Tibshirani,et al.  Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .

[27]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[28]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[29]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[32]  Myle Ott,et al.  Facebook FAIR’s WMT19 News Translation Task Submission , 2019, WMT.

[33]  Andy Way,et al.  Investigating Backtranslation in Neural Machine Translation , 2018, EAMT.

[34]  Christopher Ré,et al.  Snorkel MeTaL: Weak Supervision for Multi-Task Learning , 2018, DEEM@SIGMOD.

[35]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[36]  Benjamin Szubert,et al.  Supervised and unsupervised language modelling in Chest X-Ray radiological reports , 2020, PloS one.

[37]  Wendy W. Chapman,et al.  Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm , 2011, J. Biomed. Informatics.

[38]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[39]  Mauro Annarumma,et al.  Automated Triaging of Adult Chest Radiographs with Deep Artificial Neural Networks. , 2019, Radiology.