Conditional Bayesian networks for action detection

The task of understanding video content has seen great interest from computer vision community with the increase in camera based surveillance at grocery stores, airports, train stations, etc. What makes up a scene (objects) and what happens in the scene (actions) are two important dimensions of video understanding. In this work, we aim to identify both actions and objects in the video, however, we focus only on the objects with which human interacts. We use videos which may have multiple actions taking place during possibly overlapping intervals. Our system can recognize actions having high intra-class variance performed in complex environments using objects of different types, sizes and shapes. We produce structured descriptions for the videos as output. The descriptions identify the subject, the object, the verb and the interval of each activity recognized.

[1]  Larry S. Davis,et al.  Event Modeling and Recognition Using Markov Logic Networks , 2008, ECCV.

[2]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Aaron F. Bobick,et al.  A Framework for Recognizing Multi-Agent Action from Visual Evidence , 1999, AAAI/IAAI.

[4]  David A. Forsyth,et al.  Automatic Annotation of Everyday Movements , 2003, NIPS.

[5]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[6]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[7]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[8]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Ramakant Nevatia,et al.  Simultaneous inference of activity, pose and object , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[10]  Ramakant Nevatia,et al.  Large-scale event detection using semi-hidden Markov models , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Larry S. Davis,et al.  Multi-agent event recognition in structured scenarios , 2011, CVPR 2011.

[12]  Jeffrey Mark Siskind,et al.  Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic , 1999, J. Artif. Intell. Res..

[13]  Ramakant Nevatia,et al.  High performance object detection by collaborative learning of Joint Ranking of Granules features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Shaogang Gong,et al.  Recognising action as clouds of space-time interest points , 2009, CVPR.

[15]  Larry S. Davis,et al.  Objects in Action: An Approach for Combining Action Understanding and Object Perception , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Alan Fern,et al.  Probabilistic event logic for interval-based event recognition , 2011, CVPR 2011.

[17]  Ramakant Nevatia,et al.  Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[18]  Svetha Venkatesh,et al.  Combining image regions and human activity for indirect object recognition in indoor wide-angle views , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, CVPR 2009.

[23]  Andrew Gilbert,et al.  Fast realistic multi-action recognition using mined dense spatio-temporal features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Ramakant Nevatia,et al.  Learning 3D action models from a few 2D videos for view invariant action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.