Gestalt-based Contour Weights Improve Scene Categorization by CNNs

Humans can accurately recognize natural scenes from line drawings, consisting solely of contour-based shape cues. Deep learning strategies for this complex task, however, have thus far been applied directly to photographs, exploiting all the cues available in colour images at the pixel level. Here we report the results of fine tuning off-the-shelf pre-trained Convolutional Neural Networks (CNNs) to perform scene classification given only contour information as input. To do so we exploit the Iverson-Zucker logical/linear framework to obtain line drawings from popular scene categorization databases, including an artist’s scene database and MIT67. We demonstrate a high level of performance despite the absence of colour, texture and shading information. We also show that the inclusion of medial-axis based contour salience weights leads to a further boost, adding useful information that does not appear to be exploited when CNNs are trained to use contours alone.

[1]  Sven J. Dickinson,et al.  Local contour symmetry facilitates scene categorization , 2019, Cognition.

[2]  K. Koffka Perception: an introduction to the Gestalt-Theorie. , 1922 .

[3]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  P. Kellman,et al.  A theory of visual interpolation in object perception , 1991, Cognitive Psychology.

[5]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[6]  Kim L. Boyer,et al.  Guest Editors' Introduction: Perceptual Organization in Computer Vision: Status, Challenges, and Potential , 1999, Comput. Vis. Image Underst..

[7]  Kaleem Siddiqi,et al.  Flux invariants for shape , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[10]  Dirk B. Walther,et al.  Good Exemplars of Natural Scene Categories Elicit Clearer Patterns than Bad Exemplars but Not Greater BOLD Activity , 2013, PloS one.

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[13]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  James J. DiCarlo,et al.  Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior , 2018, Nature Neuroscience.

[15]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[16]  Shuang Bai,et al.  Growing random forest on deep convolutional neural networks for scene categorization , 2017, Expert Syst. Appl..

[17]  Steven W. Zucker,et al.  Logical/Linear Operators for Image Curves , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[19]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Li Fei-Fei,et al.  Simple line drawings suffice for functional MRI decoding of natural scene categories , 2011, Proceedings of the National Academy of Sciences.

[21]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[22]  Steven W. Zucker,et al.  Computing Contour Closure , 1996, ECCV.

[23]  Sven J. Dickinson,et al.  Scene Categorization From Contours: Medial Axis Based Salience Measures , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).