Noise or Signal: The Role of Image Backgrounds in Object Recognition

We assess the tendency of state-of-the-art object recognition models to depend on signals from image backgrounds. We create a toolkit for disentangling foreground and background signal on ImageNet images, and find that (a) models can achieve non-trivial accuracy by relying on the background alone, (b) models often misclassify images even in the presence of correctly classified foregrounds--up to 87.5% of the time with adversarially chosen backgrounds, and (c) more accurate models tend to depend on backgrounds less. Our analysis of backgrounds brings us closer to understanding which correlations machine learning models use, and how they determine models' out of distribution performance.

[1]  Patrick Pérez,et al.  Region filling and object removal by exemplar-based image inpainting , 2004, IEEE Transactions on Image Processing.

[2]  Antonio Torralba,et al.  Context models and out-of-context objects , 2012, Pattern Recognit. Lett..

[3]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[4]  Alexei A. Efros,et al.  Undoing the Damage of Dataset Bias , 2012, ECCV.

[5]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[6]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[7]  J. Dunning The elephant in the room. , 2013, European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery.

[8]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[9]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Bernt Schiele,et al.  Not Using the Car to See the Sidewalk — Quantifying and Controlling the Effects of Context in Classification and Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Percy Liang,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[12]  Bernt Schiele,et al.  Adversarial Scene Editing: Automatic Object Removal from Weak Supervision , 2018, NeurIPS.

[13]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[14]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[15]  Michael Harville,et al.  Foreground segmentation using adaptive mixture models in color and depth , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[16]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[17]  Matthias Bethge,et al.  Excessive Invariance Causes Adversarial Vulnerability , 2018, ICLR.

[18]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[19]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[20]  Boris Katz,et al.  ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.

[21]  George J. Pappas,et al.  Model-Based Robust Deep Learning , 2020, ArXiv.

[22]  Philip H. S. Torr,et al.  With Friends Like These, Who Needs Adversaries? , 2018, NeurIPS.

[23]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[24]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[25]  Alan L. Yuille,et al.  Object Recognition with and without Objects , 2016, IJCAI.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  P. Sinha,et al.  Contribution of Color to Face Recognition , 2002, Perception.

[28]  Hongjing Lu,et al.  Deep convolutional networks do not classify based on global object shape , 2018, PLoS Comput. Biol..