Unsupervised Learning of Event Classes from Video

We present a method for unsupervised learning of event classes from videos in which multiple actions might occur simultaneously. It is assumed that all such activities are produced from an underlying set of event class generators. The learning task is then to recover this generative process from visual data. A set of event classes is derived from the most likely decomposition of the tracks into a set of labelled events involving subsets of interacting tracks. Interactions between subsets of tracks are modelled as a relational graph structure that captures qualitative spatio-temporal relationships between these tracks. The posterior probability of candidate solutions favours decompositions in which events of the same class have a similar relational structure, together with other measures of well-formedness. A Markov Chain Monte Carlo (MCMC) procedure is used to efficiently search for the MAP solution. This search moves between possible decompositions of the tracks into sets of unlabelled events and at each move adds a close to optimal labelling (for this decomposition) using spectral clustering. Experiments on real data show that the discovered event classes are often semantically meaningful and correspond well with ground-truth event classes assigned by hand.

[1]  Ehud Rivlin,et al.  Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[2]  Anthony G. Cohn,et al.  Learning Functional Object-Categories from a Relational Spatio-Temporal Representation , 2008, ECAI.

[3]  Anthony G. Cohn,et al.  Qualitative Spatial Representation and Reasoning: An Overview , 2001, Fundam. Informaticae.

[4]  Qian Yu,et al.  Integrated Detection and Tracking for Multiple Moving Objects using Data-Driven MCMC Data Association , 2008, 2008 IEEE Workshop on Motion and video Computing.

[5]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Irfan A. Essa,et al.  A novel sequence representation for unsupervised analysis of human activities , 2009, Artif. Intell..

[7]  Luc De Raedt,et al.  Relational Sequence Learning , 2008, Probabilistic Inductive Logic Programming.

[8]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[9]  Mark Everingham,et al.  Implicit color segmentation features for pedestrian and object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[12]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.