Gaze Shifting Kernel: Engineering Perceptually- Aware Features for Scene Categorization

In this paper, we propose a novel gaze shifting kernel for scene image categorization, focusing on discovering the mechanism of humans perceiving visually/semantically salient regions in a scene. First, a weakly supervised embedding algorithm projects the local image descriptors (i.e., graphlets) into a pre-specified semantic space. Afterward, each graphlet can be represented by multiple visual features at both low-level and high-level. As humans typically attend to a small fraction of regions in a scene, a sparsity-constrained graphlet ranking algorithm is proposed to dynamically integrate both the low-level and the high-level visual cues. The top-ranked graphlets are either visually or semantically salient according to human perception. They are linked into a path to simulate human gaze shifting. Finally, we calculate the gaze shifting kernel (GSK) based on the discovered paths from a set of images. Experiments on the USC scene and the ZJU aerial image data sets demonstrate the competitiveness of our GSK, as well as the high consistency of the predicted path with real human gaze shifting path.

[1]  Xiao Liu,et al.  Probabilistic Graphlet Transfer for Photo Cropping , 2013, IEEE Transactions on Image Processing.

[2]  Nuno Vasconcelos,et al.  Discriminant Saliency for Visual Recognition from Cluttered Scenes , 2004, NIPS.

[3]  Qiang Ji,et al.  Probabilistic gaze estimation without active personal calibration , 2011, CVPR 2011.

[4]  Jean-Marc Odobez,et al.  Person independent 3D gaze estimation from remote RGB-D cameras , 2013, 2013 IEEE International Conference on Image Processing.

[5]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Nuno Vasconcelos,et al.  Integrated learning of saliency, complex features, and object detectors from cluttered scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[8]  Jean-Marc Odobez,et al.  Gaze estimation from multimodal Kinect data , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[9]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[10]  Frédéric Jurie,et al.  Learning Saliency Maps for Object Categorization , 2006 .

[11]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Moshe Eizenman,et al.  General theory of remote gaze estimation using the pupil center and corneal reflections , 2006, IEEE Transactions on Biomedical Engineering.

[13]  Antonio Torralba,et al.  Top-down control of visual attention in object detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[14]  Takahiro Okabe,et al.  A Head Pose-free Approach for Appearance-based Gaze Estimation , 2011, BMVC.

[15]  Yasuo Kuniyoshi,et al.  Discriminative spatial pyramid , 2011, CVPR 2011.

[16]  Zaïd Harchaoui,et al.  Image Classification with Segmentation Graph Kernels , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[18]  Atsushi Nakazawa,et al.  Point of Gaze Estimation through Corneal Surface Reflection in an Active Illumination Environment , 2012, ECCV.

[19]  Yi Yang,et al.  Discovering Discriminative Graphlets for Aerial Image Categories Recognition , 2013, IEEE Transactions on Image Processing.

[20]  Laurent Itti,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Rapid Biologically-inspired Scene Classification Using Features Shared with Visual Attention , 2022 .

[21]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[22]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Zhengyou Zhang,et al.  3D Deformable Face Tracking with a Commodity Depth Camera , 2010, ECCV.

[24]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Fei-Fei Li,et al.  Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[26]  Tsuhan Chen,et al.  Determining Patch Saliency Using Low-Level Context , 2008, ECCV.