PADS: A Probabilistic Activity Detection Framework for Video Data

There is now a growing need to identify various kinds of activities that occur in videos. In this paper, we first present a logical language called Probabilistic Activity Description Language (PADL) in which users can specify activities of interest. We then develop a probabilistic framework which assigns to any subvideo of a given video sequence a probability that the subvideo contains the given activity, and we finally develop two fast algorithms to detect activities within this framework. OffPad finds all minimal segments of a video that contain a given activity with a probability exceeding a given threshold. In contrast, the OnPad algorithm examines a video during playout (rather than afterwards as OffPad does) and computes the probability that a given activity is occurring (even if the activity is only partially complete). Our prototype Probabilistic Activity Detection System (PADS) implements the framework and the two algorithms, building on top of existing image processing algorithms. We have conducted detailed experiments and compared our approach to four different approaches presented in the literature. We show that-for complex activity definitions-our approach outperforms all the other approaches.

[1]  Stephen J. Maybank,et al.  The ADVISOR Visual Surveillance System , 2004 .

[2]  Yaser Sheikh,et al.  CASEE: A Hierarchical Event Representation for the Analysis of Videos , 2004, AAAI.

[3]  Rama Chellappa,et al.  Key Frame-Based Activity Representation Using Antieigenvalues , 2006, ACCV.

[4]  Ramakant Nevatia,et al.  VERL: An Ontology Framework for Representing and Annotating Video Events , 2005, IEEE Multim..

[5]  Eric Horvitz,et al.  Layered representations for human activity recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[6]  Rama Chellappa,et al.  Activity Modeling Using Event Probability Sequences , 2008, IEEE Transactions on Image Processing.

[7]  James F. Allen Towards a General Theory of Action and Time , 1984, Artif. Intell..

[8]  Jintao Li,et al.  Dynamic Bayesian network based event detection for soccer highlight extraction , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[9]  V. S. Subrahmanian,et al.  Foundations of multimedia database systems , 1996, JACM.

[10]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[12]  Marek J. Sergot,et al.  A logic-based calculus of events , 1989, New Generation Computing.

[13]  Angelo Montanari,et al.  Temporal representation and reasoning in artificial intelligence: Issues and approaches , 2000, Annals of Mathematics and Artificial Intelligence.

[14]  Polle Zellweger,et al.  Automatic temporal layout mechanisms , 1993, MULTIMEDIA '93.

[15]  François Brémond,et al.  Automatic Video Interpretation: A Novel Algorithm for Temporal Scenario Recognition , 2003, IJCAI.

[16]  Rama Chellappa,et al.  "Shape Activity": a continuous-state HMM for moving/deforming shapes with application to abnormal activity detection , 2005, IEEE Transactions on Image Processing.

[17]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  David Sinclair,et al.  Language-based querying of image collections on the basis of an extensible ontology , 2004, Image Vis. Comput..

[19]  Jr. Hartley Rogers Theory of Recursive Functions and Effective Computability , 1969 .

[20]  Rina Dechter,et al.  Temporal Constraint Networks , 1989, Artif. Intell..

[21]  Xiaokun Li,et al.  A hidden Markov model framework for traffic event detection using video features , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[22]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Rama Chellappa,et al.  Identification of humans using gait , 2004, IEEE Transactions on Image Processing.

[24]  K. Selçuk Candan,et al.  View management in multimedia databases , 2000, The VLDB Journal.

[25]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[26]  Yan Huang,et al.  ARGMode - Activity Recognition using Graphical Models , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[27]  R. Nevatia,et al.  EDF: A framework for Semantic Annotation of Video , 2005, Tenth IEEE International Conference on Computer Vision Workshops (ICCVW'05).

[28]  Ramakant Nevatia,et al.  Multi-agent event recognition , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[29]  V. S. Subrahmanian,et al.  Detecting Stochastically Scheduled Activities in Video , 2007, IJCAI.

[30]  D. C. Cooper,et al.  Theory of Recursive Functions and Effective Computability , 1969, The Mathematical Gazette.

[31]  Graham Coleman,et al.  Detection and explanation of anomalous activities: representing activities as bags of event n-grams , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Jianbo Shi,et al.  Detecting unusual activity in video , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[33]  Ramakant Nevatia,et al.  Video-based event recognition: activity representation and probabilistic recognition methods , 2004, Comput. Vis. Image Underst..

[34]  W. Eric L. Grimson,et al.  Simultaneous Pose Estimation and Camera Calibration from Multiple Views , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[35]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[36]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.