Multi-Domain Balanced Sampling Improves Out-of-Distribution Generalization of Chest X-ray Pathology Prediction Models

Learning models that generalize under different distribution shifts in medical imaging has been a long-standing research challenge. There have been several proposals for efficient and robust visual representation learning among vision research practitioners, especially in the sensitive and critical biomedical domain. In this paper, we propose an idea for out-of-distribution generalization of chest X-ray pathologies that uses a simple balanced batch sampling technique. We observed that balanced sampling between the multiple training datasets improves the performance over baseline models trained without balancing. Code for this work is available on Github.

[1]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[2]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  Lutz Prechelt,et al.  Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.

[4]  Francisco Herrera,et al.  A unifying view on dataset shift in classification , 2012, Pattern Recognit..

[5]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[8]  Mark D. McDonnell,et al.  Understanding Data Augmentation for Classification: When to Warp? , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[11]  Roger G. Mark,et al.  MIMIC-CXR: A large publicly available database of labeled chest radiographs , 2019, ArXiv.

[12]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[13]  Joseph Paul Cohen,et al.  On the limits of cross-domain generalization in automated X-ray prediction , 2020, MIDL.

[14]  Antonio Pertusa,et al.  PadChest: A large chest x-ray image dataset with multi-label annotated reports , 2019, Medical Image Anal..

[15]  Joseph Paul Cohen,et al.  TorchXRayVision: A library of chest X-ray datasets and models , 2021, MIDL.