CheXpert++: Approximating the CheXpert labeler for Speed, Differentiability, and Probabilistic Output

It is often infeasible or impossible to obtain ground truth labels for medical data. To circumvent this, one may build rule-based or other expert-knowledge driven labelers to ingest data and yield silver labels absent any ground-truth training data. One popular such labeler is CheXpert, a labeler that produces diagnostic labels for chest X-ray radiology reports. CheXpert is very useful, but is relatively computationally slow, especially when integrated with end-to-end neural pipelines, is non-differentiable so can't be used in any applications that require gradients to flow through the labeler, and does not yield probabilistic outputs, which limits our ability to improve the quality of the silver labeler through techniques such as active learning. In this work, we solve all three of these problems with $\texttt{CheXpert++}$, a BERT-based, high-fidelity approximation to CheXpert. $\texttt{CheXpert++}$ achieves 99.81\% parity with CheXpert, which means it can be reliably used as a drop-in replacement for CheXpert, all while being significantly faster, fully differentiable, and probabilistic in output. Error analysis of $\texttt{CheXpert++}$ also demonstrates that $\texttt{CheXpert++}$ has a tendency to actually correct errors in the CheXpert labels, with $\texttt{CheXpert++}$ labels being more often preferred by a clinician over CheXpert labels (when they disagree) on all but one disease task. To further demonstrate the utility of these advantages in this model, we conduct a proof-of-concept active learning study, demonstrating we can improve accuracy on an expert labeled random subset of report sentences by approximately 8\% over raw, unaltered CheXpert by using one-iteration of active-learning inspired re-training. These findings suggest that simple techniques in co-learning and active learning can yield high-quality labelers under minimal, and controllable human labeling demands.

[1]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[2]  Marzyeh Ghassemi,et al.  CheXclusion: Fairness gaps in deep chest X-ray classifiers , 2020, PSB.

[3]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[4]  Jon D. Patrick,et al.  Research and applications: Supervised machine learning and active learning in classification of radiology reports , 2014, J. Am. Medical Informatics Assoc..

[5]  Ronald M. Summers,et al.  TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Hoifung Poon,et al.  Deep Probabilistic Logic: A Unifying Framework for Indirect Supervision , 2018, EMNLP.

[7]  Gustavo Carneiro,et al.  Producing radiologist-quality reports for interpretable artificial intelligence , 2018, ArXiv.

[8]  Peter Szolovits,et al.  Clinically Accurate Chest X-Ray Report Generation , 2019, MLHC.

[9]  V. Lempitsky,et al.  Speeding-up convolutional neural networks: A survey , 2023, Bulletin of the Polish Academy of Sciences Technical Sciences.

[10]  David Sontag,et al.  Electronic medical record phenotyping using the anchor and learn framework , 2016, J. Am. Medical Informatics Assoc..

[11]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[12]  Roger G. Mark,et al.  MIMIC-CXR: A large publicly available database of labeled chest radiographs , 2019, ArXiv.

[13]  Antonio Pertusa,et al.  PadChest: A large chest x-ray image dataset with multi-label annotated reports , 2019, Medical Image Anal..

[14]  Ruimao Zhang,et al.  Cost-Effective Active Learning for Deep Image Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Tao Zhang,et al.  Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges , 2018, IEEE Signal Processing Magazine.

[16]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[17]  Ronald M. Summers,et al.  NegBio: a high-performance tool for negation and uncertainty detection in radiology reports , 2017, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Andrew Y. Ng,et al.  CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT , 2020, EMNLP.