Introduction of Explicit Visual Saliency in Training of Deep CNNs: Application to Architectural Styles Classification

Introduction of visual saliency or interestingness in the content selection for image classification tasks is an intensively researched topic. It has been namely fulfilled for feature selection in feature-based methods. Nowadays, in the winner classifiers of visual content such as Deep Convolutional Neural Networks, visual saliency maps have not been introduced explicitly. Pooling features in CNNs is known as a good strategy to reduce data dimensionality, computational complexity and summarize representative features for subsequent layers. In this paper we introduce visual saliency in network pooling layers to spatially filter relevant features for deeper layers. Our experiments are conducted in a specific task to identify Mexican architectural styles. The results are promising: proposed approach reduces model loss and training time keeping the same accuracy as the base-line CNN.

[1]  Markus Vincze,et al.  Saliency-based object discovery on RGB-D data with a late-fusion approach , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[3]  Jenny Benois-Pineau,et al.  Visual Content Indexing and Retrieval with Psycho-Visual Models , 2017, Visual Content Indexing and Retrieval with Psycho-Visual Models.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Michael Dorr,et al.  Space-Variant Descriptor Sampling for Action Recognition Based on Saliency and Eye Movements , 2012, ECCV.

[6]  Jenny Benois-Pineau,et al.  Connoisseur: classification of styles of Mexican architectural heritage with deep learning and visual attention prediction , 2017, CBMI.

[7]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[8]  Yizhou Yu,et al.  Deep Contrast Learning for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[11]  Jenny Benois-Pineau,et al.  Perceptual modeling in the problem of active object recognition in visual scenes , 2016, Pattern Recognit..

[12]  Abraham Montoya Obeso,et al.  Image annotation for Mexican buildings database , 2016, Optical Engineering + Applications.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Patrick Le Callet,et al.  Visual Content Indexing and Retrieval with Psycho-Visual Models , 2017, Multimedia Systems and Applications.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Jenny Benois-Pineau,et al.  Saliency-based selection of visual content for deep convolutional neural networks , 2018, Multimedia Tools and Applications.

[19]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[20]  Eduardo Zalama Casanova,et al.  Applying Deep Learning Techniques to Cultural Heritage Images Within the INCEPTION Project , 2016, EuroMed.