Multiple Agent Event Detection and Representation in Videos

We propose a novel method to detect events involving multiple agents in a video and to learn their structure in terms of temporally related chain of sub-events. The proposed method has three significant contributions over existing frameworks. First, in order to learn the event structure from training videos. we present the concept of a video event graph, which is composed of temporally related sub-events. Using the video event graph, we automatically encode the event dependency graph. The event dependency graph is the learnt event model that depicts the frequency of occurrence of conditionally dependent sub-events. Second. we pose the problem of event detection in novel videos as clustering the maximally correlated sub-events, and use normalized cuts to determine these clusters. The pIincipal assumption made in this work is that the events are composed of highly correlated chain of sub-events. that have high weights (association) within the cluster and relatively low weights (disassociation) between clusters. These weights (between sub-events) are the likelihood estimates obtained from the event models. Last, we recognize the importance of representing the variations in the temporal order of sub-events. occurring in an event, and encode the probabilities directly into our representation. We show results of our learning, detection, and representation of events for videos in the meeting, surveillance, and railroad monitoring domains.

[1]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Aaron F. Bobick,et al.  Action recognition using probabilistic parsing , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[3]  Hans-Hellmut Nagel,et al.  Algorithmic characterization of vehicle trajectories from image sequences by motion verbs , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Jeffrey Mark Siskind,et al.  Visual Event Classification via Force Dynamics , 2000, AAAI/IAAI.

[5]  James F. Allen,et al.  Actions and Events in Interval Temporal Logic , 1994, J. Log. Comput..

[6]  Jianbo Shi,et al.  Detecting unusual activity in video , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[7]  Matthew Brand,et al.  Discovery and Segmentation of Activities in Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Kevin P. Murphy,et al.  Learning the Structure of Dynamic Probabilistic Networks , 1998, UAI.

[9]  Claudio S. Pinhanez,et al.  Human action detection using PNF propagation of temporal constraints , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[10]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interaction , 1999, ICVS.

[11]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Yaser Sheikh,et al.  CASEE: A Hierarchical Event Representation for the Analysis of Videos , 2004, AAAI.

[13]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[14]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.