Video-based event recognition: activity representation and probabilistic recognition methods

We present a new representation and recognition method for human activities. An activity is considered to be composed of action threads, each thread being executed by a single actor. A single-thread action is represented by a stochastic finite automaton of event states, which are recognized from the characteristics of the trajectory and shape of moving blob of the actor using Bayesian methods. A multi-agent event is composed of several action threads related by temporal constraints. Multi-agent events are recognized by propagating the constraints and likelihood of event threads in a temporal logic network. We present results on real-world data and performance characterization on perturbed data.

[1]  Milind R. Naphade,et al.  Detecting semantic concepts using context and audiovisual features , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[2]  Ramakant Nevatia,et al.  Multi-agent event recognition , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[3]  Aaron F. Bobick,et al.  Recognizing Planned, Multiperson Action , 2001, Comput. Vis. Image Underst..

[4]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Eric Horvitz,et al.  Layered representations for human activity recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[6]  David C. Hogg,et al.  Statistical Models of Object Interaction , 2004, International Journal of Computer Vision.

[7]  Harald Bergstriim Mathematical Theory of Probability and Statistics , 1966 .

[8]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[9]  Azriel Rosenfeld,et al.  Visual surveillance and monitoring , 1998 .

[10]  Jake K. Aggarwal,et al.  Human motion analysis: a review , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[11]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[13]  James F. Allen,et al.  Actions and Events in Interval Temporal Logic , 1994 .

[14]  Stuart J. Russell,et al.  Adaptive Probabilistic Networks with Hidden Variables , 1997, Machine Learning.

[15]  Tanveer F. Syeda-Mahmood,et al.  Detecting topical events in digital video , 2000, ACM Multimedia.

[16]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Ramakant Nevatia,et al.  Representation and optimal recognition of human activities , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[18]  Aaron F. Bobick,et al.  Recognition and interpretation of parametric gesture , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[19]  Shaogang Gong,et al.  Visual Surveillance in a Dynamic and Uncertain World , 1995, Artif. Intell..

[20]  Ramakant Nevatia,et al.  Event Detection and Analysis from Video Streams , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[22]  Takeo Kanade,et al.  Introduction to the Special Section on Video Surveillance , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Masanobu Yamamoto,et al.  Scene constraints-aided tracking of human body , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[24]  Carlo S. Regazzoni,et al.  Advanced Video-Based Surveillance Systems , 1998 .

[25]  I. Good,et al.  Mathematical Theory of Probability and Statistics , 1966 .