Geometrical Cues in Visual Saliency Models for Active Object Recognition in Egocentric Videos

In the problem of "human sensing", videos recorded with wearable cameras give an "egocentric" view of the world, capturing details of human activities. In this paper we continue research on visual saliency for such kind of content with the goal of "active" objects recognition in egocentric videos. In particular, a geometrical cue is considered in case when the central-bias hypothesis does not hold. The proposed visual saliency models are trained based on eye fixations of observers and incorporated into spatio-temporal saliency models. The proposed models have been compared to state of the art visual saliency models using a metric based on target object recognition performances. The results are promising:they highlight the necessity of a non-centered geometric saliency cue.

[1]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[3]  Scott Daly,et al.  Engineering observations from spatiovelocity and spatiotemporal visual models , 1998, Electronic Imaging.

[4]  Yan Liu,et al.  Video Saliency Detection via Dynamic Consistent Spatio-Temporal Attention Modelling , 2013, AAAI.

[5]  Lihi Zelnik-Manor,et al.  Learning Video Saliency from Human Gaze Using Candidate Selection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Nathalie Guyader,et al.  Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos , 2009, International Journal of Computer Vision.

[7]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[8]  Jenny Benois-Pineau,et al.  Fusion of Multiple Visual Cues for Visual Saliency Extraction from Wearable Camera Settings with Strong Motion , 2012, ECCV Workshops.

[9]  James M. Rehg,et al.  Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.

[10]  Alan C. Bovik,et al.  Saliency Prediction on Stereoscopic Videos , 2014, IEEE Transactions on Image Processing.

[11]  Hubert Konik,et al.  A Spatiotemporal Saliency Model for Video Surveillance , 2011, Cognitive Computation.

[12]  Nancy Millette,et al.  How People Look at Pictures , 1935 .

[13]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[14]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[15]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[16]  Gunnar Farnebäck,et al.  Fast and Accurate Motion Estimation Using Orientation Tensors and Parametric Motion Models , 2000, ICPR.

[17]  Jenny Benois-Pineau,et al.  Modeling instrumental activities of daily living in egocentric vision as sequences of active objects and context for alzheimer disease research , 2013, MIIRH '13.

[18]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Oleg V. Komogortsev Gaze-contingent video compression with targeted gaze containment performance , 2009, J. Electronic Imaging.

[20]  Michael Dorr,et al.  Space-Variant Descriptor Sampling for Action Recognition Based on Saliency and Eye Movements , 2012, ECCV.

[21]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[22]  Nicolas Riche,et al.  Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Thierry Baccino,et al.  Methods for comparing scanpaths and saliency maps: strengths and weaknesses , 2012, Behavior Research Methods.

[24]  M. Land,et al.  The Roles of Vision and Eye Movements in the Control of Activities of Daily Living , 1998, Perception.

[25]  Wen Gao,et al.  A dataset and evaluation methodology for visual saliency in video , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[26]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[27]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[28]  David W. Murray,et al.  Wearable hand activity recognition for event summarization , 2005, Ninth IEEE International Symposium on Wearable Computers (ISWC'05).

[29]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[31]  Peyman Milanfar,et al.  Nonparametric bottom-up saliency detection by self-resemblance , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[32]  H. Boujut,et al.  No-reference video quality assessment of H.264 video streams based on semantic saliency maps , 2011, Electronic Imaging.

[33]  Frédo Durand,et al.  A Benchmark of Computational Models of Saliency to Predict Human Fixations , 2012 .

[34]  Thomas Martinetz,et al.  Variability of eye movements when viewing dynamic natural scenes. , 2010, Journal of vision.

[35]  Lijuan Duan,et al.  Visual Conspicuity Index: Spatial Dissimilarity, Distance, and Central Bias , 2011, IEEE Signal Processing Letters.

[36]  Matthai Philipose,et al.  Egocentric recognition of handled objects: Benchmark and analysis , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[37]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.