Crowd saliency prediction with optimal feature combinations

Crowd saliency prediction refers to predicting where people look at in crowd scene. Humans have remarkable ability to rapidly direct their gaze to select visual information of interest when looking at a visual scene. Until now, research efforts are still focused on that which type of feature is representative for crowd saliency, and which type of learning model is the robust one for crowd saliency prediction. In this paper, we propose a Random Forest (RF) based crowd saliency prediction approach with optimal feature combination, i.e., the Feature Combination Selection for Crowd Saliency (FCSCS) framework. More specifically, we first define two representative crowd saliency features: FaceSizeDiff and FacePoseDiff. Next, we adopt the Random Forest (RF) algorithm to construct our saliency learning model. Then, we evaluate the performance of crowd saliency prediction classifiers with different feature combinations (fifteen combinations in our experiments). Those selected features include low-level features (i.e., color, intensity, orientation), four existing crowd features (i.e., face size, face density, frontal face, profile face) and two new defined features (i.e., FaceSizeDiff and FacePoseDiff). Finally, we obtain the optimal feature combination that is most suitable for crowd saliency prediction. We conduct extensive experiments and empirical evaluation to demonstrate the satisfactory performance of our approach.

[1]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[3]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[4]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[5]  Zulin Wang,et al.  Learning to Predict Saliency on Face Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Heinz Hügli,et al.  Empirical Validation of the Saliency-based Model of Visual Attention , 2003 .

[7]  C.-C. Jay Kuo,et al.  Learning a Combined Model of Visual Saliency for Fixation Prediction , 2016, IEEE Transactions on Image Processing.

[8]  Huchuan Lu,et al.  Saliency detection via Cellular Automata , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[10]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[11]  Christof Koch,et al.  Predicting human gaze using low-level saliency combined with face detection , 2007, NIPS.

[12]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[14]  Qi Zhao,et al.  Saliency in Crowd , 2014, ECCV.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.