Wide or Narrow? A Visual Attention Inspired Model for View-Type Classification

Emerging research revealed that the view-type of photos is not only related to the field of data sciences, such as the sentiment brought forth by sightseeing spots, but also in the field of social sciences about human emotions and behaviors. These potential usages of view-types trigger a challenging problem, that is to automatically distinguish them into wide or narrow. In this paper, we present a computational model to classify them inspired by the human visual system. We found two cues that can represent the visual attention, i.e., focus cue and scale cue. The focus cue is modeled in the frequency domain using the non-sampled contourlet transform (NSCT) and speeded up robust features (SURF). The scale cue is modeled by defining the spatial size and conceptual sizes of an object in the image, whereby AdobeBING and convolutional neural network are used for the respective measurements. By integrating these focus and scale models, a robust scheme is hence proposed for this non-trivial task. The experiments on a newly established dataset, which has 5050 natural images, show better performance by our proposal when compared to the state-of-the-arts.

[1]  Arnold W. Schumann,et al.  Deep learning for image-based weed detection in turfgrass , 2019, European Journal of Agronomy.

[2]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  N. Srinivasan,et al.  Global-happy and local-sad: Perceptual processing affects emotion identification , 2010 .

[4]  Kun Zhang,et al.  Classification of Breast Cancer Based on Histology Images Using Convolutional Neural Networks , 2018, IEEE Access.

[5]  Gonzalo Pajares Martinsanz,et al.  A wavelet-based image fusion tutorial , 2004 .

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Takatsune Kumada,et al.  Visual attention inspired distant view and close-up view classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[8]  Ashish Khare,et al.  Fusion of multimodal medical images using Daubechies complex wavelet transform - A multiresolution approach , 2014, Inf. Fusion.

[9]  Yanpeng Cao,et al.  Learning human photo shooting patterns from large-scale community photo collections , 2014, Multimedia Tools and Applications.

[10]  Imran Memon,et al.  Travel Recommendation Using Geo-tagged Photos in Social Media for Tourist , 2015, Wirel. Pers. Commun..

[11]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[13]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14]  Masatoshi Yoshikawa,et al.  Anaba: An obscure sightseeing spots discovering system , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[15]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[16]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[17]  Antonio Torralba,et al.  Depth Estimation from Image Structure , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Taylor W. Schmitz,et al.  Opposing Influences of Affective State Valence on Visual Cortical Encoding , 2009, The Journal of Neuroscience.

[19]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[20]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[21]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[22]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[23]  Christian E. Waugh,et al.  Smile to see the forest: Facially expressed positive emotions broaden cognition , 2010, Cognition & emotion.

[24]  Junsong Yuan,et al.  Adobe Boxes: Locating Object Proposals Using Object Adobes , 2016, IEEE Transactions on Image Processing.

[25]  G. Easley,et al.  Sparse directional image representations using the discrete shearlet transform , 2008 .

[26]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[27]  Minh N. Do,et al.  The Nonsubsampled Contourlet Transform: Theory, Design, and Applications , 2006, IEEE Transactions on Image Processing.

[28]  Antonio Schettino,et al.  rain mechanisms for emotional influences on perception and attention : hat is magic and what is not , 2012 .

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30]  Yang Liu,et al.  Happy Travelers Take Big Pictures: A Psychological Study with Machine Learning and Big Data , 2017, ArXiv.

[31]  Yuming Fang,et al.  A Hybrid Method for Multi-Focus Image Fusion Based on Fast Discrete Curvelet Transform , 2017, IEEE Access.

[32]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Kamal Alameh,et al.  Effective plant discrimination based on the combination of local binary pattern operators and multiclass support vector machine methods , 2019, Information Processing in Agriculture.

[34]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  R. Pekrun,et al.  Always look on the broad side of life: happiness increases the breadth of sensory memory. , 2011, Emotion.

[36]  Gilles Pourtois,et al.  Positive emotion broadens attention focus through decreased position-specific spatial encoding in early visual cortex: Evidence from ERPs , 2013, Cognitive, affective & behavioral neuroscience.

[37]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[38]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  L. Pessoa,et al.  Positive emotions broaden the scope of attention and thought‐action repertoires , 2005, Cognition & emotion.

[40]  Kun Tang,et al.  A Comparative Study of Aggressive Driving Behavior Recognition Algorithms Based on Vehicle Motion Data , 2019, IEEE Access.

[41]  Zhikui Chen,et al.  Integration of Image Feature and Word Relevance: Toward Automatic Image Annotation in Cyber-Physical-Social Systems , 2018, IEEE Access.

[42]  Mathias Lux,et al.  Why did you take this photo: a study on user intentions in digital photo productions , 2010, SAPMIA '10.

[43]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[44]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[45]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Chee Seng Chan,et al.  Getting to Know Low-light Images with The Exclusively Dark Dataset , 2018, Comput. Vis. Image Underst..

[47]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[48]  Jianwen Hu,et al.  Multi-Focus Image Fusion Based on NSCT and NSST , 2015 .

[49]  Pan Lin,et al.  Multifocus Image Fusion Based on NSCT and Focused Area Detection , 2014, IEEE Sensors Journal.

[50]  Marimuthu Palaniswami,et al.  Smoke detection in video using wavelets and support vector machines , 2009 .

[51]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[53]  Derek M. Isaacowitz,et al.  Positive mood broadens visual attention to positive stimuli , 2006, Motivation and emotion.

[54]  William A. Mackaness,et al.  Development of a Speech-Based Augmented Reality System to Support Exploration of Cityscape , 2006, Trans. GIS.

[55]  Alan C. Bovik,et al.  Depth estimation from monocular color images using natural scene statistics models , 2013, IVMSP 2013.

[56]  William Mackaness,et al.  Mapping the visual magnitude of popular tourist sites in Edinburgh city , 2016 .