Automatic Learning of Background Semantics in Generic Surveilled Scenes

Advanced surveillance systems for behavior recognition in outdoor traffic scenes depend strongly on the particular configuration of the scenario. Scene-independent trajectory analysis techniques statistically infer semantics in locations where motion occurs, and such inferences are typically limited to abnormality. Thus, it is interesting to design contributions that automatically categorize more specific semantic regions. State-of-the-art approaches for unsupervised scene labeling exploit trajectory data to segment areas like sources, sinks, or waiting zones. Our method, in addition, incorporates scene-independent knowledge to assign more meaningful labels like crosswalks, sidewalks, or parking spaces. First, a spatiotemporal scene model is obtained from trajectory analysis. Subsequently, a so-called GI-MRF inference process reinforces spatial coherence, and incorporates taxonomy-guided smoothness constraints. Our method achieves automatic and effective labeling of conceptual regions in urban scenarios, and is robust to tracking errors. Experimental validation on 5 surveillance databases has been conducted to assess the generality and accuracy of the segmentations. The resulting scene models are used for model-based behavior analysis.

[1]  Rama Chellappa,et al.  A Constrained Probabilistic Petri Net Framework for Human Activity Detection in Video* , 2008, IEEE Transactions on Multimedia.

[2]  Jordi Gonzàlez,et al.  On tracking inside groups , 2010, Machine Vision and Applications.

[3]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[4]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Gian Luca Foresti,et al.  On-line trajectory clustering for anomalous events detection , 2006, Pattern Recognit. Lett..

[6]  Biswajit Bose,et al.  Improving object classification in far-field video , 2004, CVPR 2004.

[7]  Eric Sommerlade,et al.  Finding prototypes to estimate trajectory development in outdoor scenarios , 2008 .

[8]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[9]  Pedro F. Felzenszwalb,et al.  Efficient belief propagation for early vision , 2004, CVPR 2004.

[10]  Ian D. Reid,et al.  A general method for human activity recognition in video , 2006, Comput. Vis. Image Underst..

[11]  Shaogang Gong,et al.  Scene Segmentation for Behaviour Correlation , 2008, ECCV.

[12]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[13]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[14]  W. Eric L. Grimson,et al.  Learning Semantic Scene Models by Trajectory Analysis , 2006, ECCV.

[15]  Tim J. Ellis,et al.  Learning semantic scene models from observing activity in visual surveillance , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  F. Xavier Roca,et al.  Understanding dynamic scenes based on human sequence evaluation , 2009, Image Vis. Comput..

[17]  Tim J. Ellis,et al.  Hierarchical database for a multi-camera surveillance system , 2005, Pattern Analysis and Applications.

[18]  Stan Z. Li Markov Random Field Modeling in Image Analysis , 2009, Advances in Pattern Recognition.

[19]  Jamie Shotton,et al.  The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Tieniu Tan,et al.  A system for learning statistical motion patterns , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Alberto Del Bimbo,et al.  Video Annotation and Retrieval Using Ontologies and Rule Learning , 2010, IEEE MultiMedia.

[22]  Mubarak Shah,et al.  Learning object motion patterns for anomaly detection and improved object detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[24]  Andrew Zisserman,et al.  OBJ CUT , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  François Brémond,et al.  Video understanding for complex activity recognition , 2006, Machine Vision and Applications.

[26]  H.-H. Nagel,et al.  Representation of occurrences for road vehicle traffic , 2008, Artif. Intell..

[27]  William Croft,et al.  Cognitive Linguistics , 2004 .