Exploiting contextual data for event retrieval in surveillance video

Contextual information is vital for the robust extraction of semantic information in automated surveillance systems. We have developed a scene independent framework for the detection of events in which we provide 2D and 3D contextual data for the scene under surveillance via a novel fast and convenient interface tool. In addition, the proposed framework illustrates the use of integral images, not only for detection, as with the classic Viola-Jones object detector, but also for efficient tracking. Finally, we provide a quantitative assessment of the performance of the proposed system in a number of physical locations via groundtruthed datasets.

[1]  R. Cucchiara,et al.  Multimedia surveillance: content-based retrieval with multicamera people tracking , 2006, VSSN '06.

[2]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[3]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Alan F. Smeaton,et al.  Thermo-visual feature fusion for object tracking using multiple spatiogram trackers , 2007 .

[5]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[6]  Alan Hanjalic,et al.  Online training of object detectors from unlabeled surveillance video , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[7]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[8]  Horst Bischof,et al.  On-line Boosting and Vision , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Nikolaos Papanikolopoulos,et al.  Learning of moving cast shadows for dynamic environments , 2008, 2008 IEEE International Conference on Robotics and Automation.

[10]  R. E. Allsop,et al.  Bayesian analysis for fusion of data from disparate imaging systems for surveillance , 2003, Image Vis. Comput..

[11]  Noel E. O'Connor,et al.  Vision-based analysis of pedestrian traffic data , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[12]  Paul Over,et al.  TRECVID 2008 - Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2010, TRECVID.

[13]  Cheng Lu,et al.  On the removal of shadows from images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  H. Opower Multiple view geometry in computer vision , 2002 .

[15]  R. Cucchiara Multimedia surveillance systems , 2005, VSSN@MM.

[16]  Michael Harville,et al.  Stereo person tracking with adaptive plan-view templates of height and occupancy statistics , 2004, Image Vis. Comput..

[17]  Alan F. Smeaton,et al.  A Framework for Evaluating Stereo-Based Pedestrian Detection Techniques , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[19]  Mohan M. Trivedi,et al.  Detecting Moving Shadows: Algorithms and Evaluation , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  James W. Davis,et al.  Feature-level Fusion for Object Segmentation using Mutual Information , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[21]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.