Learning Dynamic Event Descriptions in Image Sequences

Automatic detection of dynamic events in video sequences has a variety of applications including visual surveillance and monitoring, video highlight extraction, intelligent transportation systems, video summarization, and many more. Learning an accurate description of the various events in real-world scenes is challenging owing to the limited user-labeled data as well as the large variations in the pattern of the events. Pattern differences arise either due to the nature of the events themselves such as the spatio-temporal events or due to missing or ambiguous data interpretation using computer vision methods. In this work, we introduce a novel method for representing and classifying events in video sequences using reversible context-free grammars. The grammars are learned using a semi-supervised learning method. More concretely, by using the classification entropy as a heuristic cost function, the grammars are iteratively learned using a search method. Experimental results demonstrating the efficacy of the learning algorithm and the event detection method applied to traffic video sequences are presented.

[1]  Yasubumi Sakakibara,et al.  Learning context-free grammars using tabular representations , 2005, Pattern Recognit..

[2]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[3]  Jake K. Aggarwal,et al.  Recognition of Composite Human Activities through Context-Free Grammar Based Representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Andreas Stolcke,et al.  Bayesian learning of probabilistic language models , 1994 .

[5]  Tim Oates,et al.  Two Algorithms for Learning the Parameters of Stochastic Context-Free Grammars , 2001 .

[6]  Bill Keller,et al.  Evolutionary induction of stochastic context free grammars , 2005, Pattern Recognit..

[7]  Eric Horvitz,et al.  Layered representations for learning and inferring office activity from multiple sensory channels , 2004, Comput. Vis. Image Underst..

[8]  Irfan A. Essa,et al.  Recognizing multitasked activities from video using stochastic context-free grammar , 2002, AAAI/IAAI.

[9]  Graham Coleman,et al.  Detection and explanation of anomalous activities: representing activities as bags of event n-grams , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Ramakant Nevatia,et al.  Large-scale event detection using semi-hidden Markov models , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[12]  Yasubumi Sakakibara,et al.  Learning context-free grammars from structural data in polynomial time , 1988, COLT '88.

[13]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Mubarak Shah,et al.  Multiple Agent Event Detection and Representation in Videos , 2005, AAAI.

[15]  Svetha Venkatesh,et al.  Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Milind R. Naphade,et al.  Discovering recurrent events in video using unsupervised methods , 2002, Proceedings. International Conference on Image Processing.

[17]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[18]  Nikolaos Papanikolopoulos,et al.  Robust target detection and tracking through integration of motion, color, and geometry , 2006, Comput. Vis. Image Underst..

[19]  Samy Bengio,et al.  Semi-supervised adapted HMMs for unusual event detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).