Specifying, Interpreting and Detecting High-level, Spatio-Temporal Composite Events in Single and Multi-Camera Systems

Detecting and tracking moving objects are important and challenging problems which have attracted much attention from the research community. However, in most cases, it is not enough to only track the objects. The goal should be to detect the occurrences of events of interest, which is important for applications such as video surveillance, video browsing and indexing. Yet, event detection introduces the challenge of providing the flexibility to specify customized events with varying complexity, and entering them to a system in a generic way. The event definitions should not be pre-defined and hard-coded. We introduce a spatio-temporal event detection system which lets the users to specify multiple composite events of high-complexity, and then detects their occurrence automatically. Events can be defined on a single camera view or across multiple camera views. Semantically higher level event scenarios can be built by using the building blocks, which we call the primitive events, and combining them by operators. More importantly, the newly defined composite events can be combined with each other. This layered structure makes the definition of events with higher and higher complexity possible. The event definitions are written to an XML file, which is then parsed and communicated to the tracking engines running on the videos of the corresponding cameras. With the proposed system, we have reached the next level and managed to go from detecting "a person exiting the building" to detecting "a person coming from the south corridor of the building and then exiting the building".

[1]  Ramakant Nevatia,et al.  Hierarchical Language-based Representation of Events in Video Streams , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[2]  Carlo S. Regazzoni,et al.  Real-time video-shot detection for scene surveillance applications , 2000, IEEE Trans. Image Process..

[3]  Yutaka Satoh,et al.  Event Detection for a Visual Surveillance System Using Stereo Omni-directional System , 2003, KES.

[4]  Ramakant Nevatia,et al.  An Ontology for Video Event Representation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[5]  Larry S. Davis,et al.  VidMAP: video monitoring of activity with Prolog , 2005, IEEE Conference on Advanced Video and Signal Based Surveillance, 2005..

[6]  M. Thonnat,et al.  Video understanding for metro surveillance , 2004, IEEE International Conference on Networking, Sensing and Control, 2004.

[7]  Tim J. Ellis,et al.  A hierarchical database for visual surveillance applications , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[8]  Aaron F. Bobick,et al.  Recognition of multi-agent interaction in video surveillance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  François Brémond,et al.  Automatic Video Interpretation: A Novel Algorithm for Temporal Scenario Recognition , 2003, IJCAI.

[10]  François Brémond,et al.  Video surveillance for aircraft activity monitoring , 2005, IEEE Conference on Advanced Video and Signal Based Surveillance, 2005..