Perceptual multi-channel visual feature fusion for scene categorization

Abstract Effectively recognizing sceneries from a variety of categories is an indispensable but challenging technique in computer vision and intelligent systems. In this work, we propose a novel image kernel based on human gaze shifting, aiming at discovering the mechanism of humans perceiving visually/semantically salient regions within a scenery. More specifically, we first design a weakly supervised embedding algorithm which projects the local image features (i.e., graphlets in this work) onto the pre-defined semantic space. Thereby, we describe each graphlet by multiple visual features at both low-level and high-level. It is generally acknowledged that humans attend to only a few regions within a scenery. Thus we formulate a sparsity-constrained graphlet ranking algorithm which incorporates visual clues at both the low-level and the high-level. According to human visual perception, these top-ranked graphlets are either visually or semantically salient. We sequentially connect them into a path which mimics human gaze shifting. Lastly, a so-called gaze shifting kernel (GSK) is calculated based on the learned paths from a collection of scene images. And a kernel SVM is employed for calculating the scene categories. Comprehensive experiments on a series of well-known scene image sets shown the competitiveness and robustness of our GSK. We also demonstrated the high consistency of the predicted path with real human gaze shifting path.

[1]  Jean-Marc Odobez,et al.  Gaze estimation from multimodal Kinect data , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Yi Yang,et al.  Weakly Supervised Photo Cropping , 2014, IEEE Transactions on Multimedia.

[3]  Tat-Seng Chua,et al.  Learning from Multiple Social Networks , 2016, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[4]  Yue Gao,et al.  Feature Correlation Hypergraph: Exploiting High-order Potentials for Multimodal Recognition , 2014, IEEE Transactions on Cybernetics.

[5]  Fu Li,et al.  A pruning method of refining recursive reduced least squares support vector regression , 2015, Inf. Sci..

[6]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[7]  Jean-Marc Odobez,et al.  Person independent 3D gaze estimation from remote RGB-D cameras , 2013, 2013 IEEE International Conference on Image Processing.

[8]  Takahiro Okabe,et al.  A Head Pose-free Approach for Appearance-based Gaze Estimation , 2011, BMVC.

[9]  Yuanyuan Wang,et al.  A rough margin based support vector machine , 2008, Inf. Sci..

[10]  Shaoning Pang,et al.  Personalized mode transductive spanning SVM classification tree , 2011, Inf. Sci..

[11]  Xuelong Li,et al.  Actively Learning Human Gaze Shifting Paths for Semantics-Aware Photo Cropping , 2014, IEEE Transactions on Image Processing.

[12]  Rathinasamy Sakthivel,et al.  Design of state estimator for bidirectional associative memory neural networks with leakage delays , 2015, Inf. Sci..

[13]  Qinghua Hu,et al.  Support function machine for set-based classification with application to water quality evaluation , 2017, Inf. Sci..

[14]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Luming Zhang,et al.  Multiple Social Network Learning and Its Application in Volunteerism Tendency Prediction , 2015, SIGIR.

[16]  Fei Gao,et al.  Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking , 2017, IEEE Transactions on Cybernetics.

[17]  Laurent Itti,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Rapid Biologically-inspired Scene Classification Using Features Shared with Visual Attention , 2022 .

[18]  Tsuhan Chen,et al.  Determining Patch Saliency Using Low-Level Context , 2008, ECCV.

[19]  Xuelong Li,et al.  Fusion of Multichannel Local and Global Structural Cues for Photo Aesthetics Evaluation , 2014, IEEE Transactions on Image Processing.

[20]  Yi Yang,et al.  Discovering Discriminative Graphlets for Aerial Image Categories Recognition , 2013, IEEE Transactions on Image Processing.

[21]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Nuno Vasconcelos,et al.  Integrated learning of saliency, complex features, and object detectors from cluttered scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Antonio Torralba,et al.  Top-down control of visual attention in object detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[24]  Xinjun Peng,et al.  A nu-twin support vector machine (nu-TSVM) classifier and its geometric algorithms , 2010, Inf. Sci..

[25]  Xiao Liu,et al.  Semi-supervised Node Splitting for Random Forest Construction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Chun Chen,et al.  Active Learning Based on Locally Linear Reconstruction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[28]  Gavin Brown,et al.  Random Ordinality Ensembles: Ensemble methods for multi-valued categorical data , 2015, Inf. Sci..

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  José M. Merigó,et al.  The optimal group continuous logarithm compatibility measure for interval multiplicative preference relations based on the COWGA operator , 2016, Inf. Sci..

[31]  Takahiro Ishikawa,et al.  Passive driver gaze tracking with active appearance models , 2004 .

[32]  Ludwik Kurz,et al.  A New Approach to the Detection of Moving Objects , 1997, Inf. Sci..

[33]  Yue Gao,et al.  Representative Discovery of Structure Cues for Weakly-Supervised Image Segmentation , 2014, IEEE Transactions on Multimedia.

[34]  Fei-Fei Li,et al.  Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[35]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[36]  Ling Shao,et al.  Performance evaluation of deep feature learning for RGB-D image/video classification , 2017, Inf. Sci..

[37]  Yin-Fu Huang,et al.  Integrating frequent pattern clustering and branch-and-bound approaches for data partitioning , 2016, Inf. Sci..

[38]  Xiao Liu,et al.  Probabilistic Graphlet Transfer for Photo Cropping , 2013, IEEE Transactions on Image Processing.

[39]  Nuno Vasconcelos,et al.  Discriminant Saliency for Visual Recognition from Cluttered Scenes , 2004, NIPS.

[40]  Jianping Fan,et al.  iPrivacy: Image Privacy Protection by Identifying Sensitive Objects via Deep Multi-Task Learning , 2017, IEEE Transactions on Information Forensics and Security.

[41]  Xuelong Li,et al.  Image Categorization by Learning a Propagated Graphlet Path , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Xiao Liu,et al.  Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Ben M. Chen,et al.  Identification of stock market forces in the system adaptation framework , 2014, Inf. Sci..

[44]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[45]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Yi Yang,et al.  A Probabilistic Associative Model for Segmenting Weakly Supervised Images , 2014, IEEE Transactions on Image Processing.

[47]  Yi Yang,et al.  Beyond Doctors: Future Health Prediction from Multimedia and Multimodal Observations , 2015, ACM Multimedia.

[48]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[49]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Vassilios Petridis,et al.  A lattice-based neuro-computing methodology for real-time human action recognition , 2011, Inf. Sci..

[51]  Xinjun Peng,et al.  Building sparse twin support vector machine classifiers in primal space , 2011, Inf. Sci..

[52]  Frédéric Jurie,et al.  Learning Saliency Maps for Object Categorization , 2006 .

[53]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Ernesto Damiani,et al.  A Retinex model based on Absorbing Markov Chains , 2016, Inf. Sci..

[55]  Yasuo Kuniyoshi,et al.  Discriminative spatial pyramid , 2011, CVPR 2011.

[56]  Jin Li,et al.  Identity-based chameleon hashing and signatures without key exposure , 2014, Inf. Sci..

[57]  Zaïd Harchaoui,et al.  Image Classification with Segmentation Graph Kernels , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[59]  Meng Wang,et al.  Oracle in Image Search: A Content-Based Approach to Performance Prediction , 2012, TOIS.

[60]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[61]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[62]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[63]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[64]  Feiping Nie,et al.  Embedding new data points for manifold learning via coordinate propagation , 2007, Knowledge and Information Systems.

[65]  Yejun Xu,et al.  A distance-based framework to deal with ordinal and additive inconsistencies for fuzzy reciprocal preference relations , 2016, Inf. Sci..