Statistical Analysis of Visual Attentional Patterns for Video Surveillance

We show that the way people observe video sequences, other than what they observe, is important for the understanding and the prediction of human activities. In this study, we consider 36 surveillance videos, organized in four categories (confront, nothing, fight, play): the videos are observed by 19 people, ten of them are experienced operators and the other nine are novices, and the gaze trajectories of both populations are recorded by an eye tracking device. Due to the proved superior ability of experienced operators in predicting violence in surveillance footage, our aim is to distinguish the two classes of people, highlighting in which respect expert operators differ from novices. Extracting spatio-temporal features from the eye tracking data, and training standard machine learning classifiers, we are able to discriminate the two groups of subjects with an average accuracy of 80.26%. The idea is that expert operators are more focused on few regions of the scene, sampling them with high frequency and low predictability. This can be thought as a first step toward the advanced automated analysis of video surveillance footage, where machines imitate as best as possible the attentive mechanisms of humans.

[1]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[2]  David N. Lee,et al.  Where we look when we steer , 1994, Nature.

[3]  M. Hayhoe,et al.  In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.

[4]  Vittorio Murino,et al.  Socially intelligent surveillance and monitoring: Analysing social dimensions of physical space , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[5]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[6]  Alessio Del Bue,et al.  Human behavior analysis in video surveillance: A Social Signal Processing perspective , 2013, Neurocomputing.

[7]  Joseph C. Hickox,et al.  COMPARISON OF EXPERT AND NOVICE SCAN BEHAVIORS DURING VFR FLIGHT , 2001 .

[8]  Antonio Torralba,et al.  Modeling global scene factors in attention. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[9]  Shaogang Gong,et al.  Stream-based joint exploration-exploitation active learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Rongrong Ji,et al.  What are we looking for: Towards statistical modeling of saccadic eye movements and visual saliency , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Andrea Cavallaro,et al.  Multifeature Object Trajectory Clustering for Video Analysis , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Gavin Hales,et al.  Gun crime: the market in and use of illegal firearms , 2006 .

[13]  Frank E. Pollick,et al.  Experience in judging intent to harm modulates parahippocampal activity: An fMRI study with experienced CCTV operators , 2014, Cortex.

[14]  J. Pratt Visual fixation offsets affect both the initiation and the kinematic features of saccades , 1998, Experimental Brain Research.

[15]  M. Stella Atkins,et al.  Eye gaze patterns differentiate novice and experts in a virtual laparoscopic surgery training environment , 2004, ETRA.

[16]  L. Stark,et al.  The main sequence, a tool for studying human eye movements , 1975 .

[17]  A. Gale,et al.  Skills in detecting gun carrying from CCTV , 2008, 2008 42nd Annual IEEE International Carnahan Conference on Security Technology.