Simultaneous tracking and action recognition for single actor human actions

This paper presents an approach to simultaneously tracking the pose and recognizing human actions in a video. This is achieved by combining a Dynamic Bayesian Action Network (DBAN) with 2D body part models. Existing DBAN implementation relies on fairly weak observation features, which affects the recognition accuracy. In this work, we use a 2D body part model for accurate pose alignment, which in turn improves both pose estimate and action recognition accuracy. To compensate for the additional time required for alignment, we use an action entropy-based scheme to determine the minimum number of states to be maintained in each frame while avoiding sample impoverishment. In addition, we also present an approach to automation of the keypose selection task for learning 3D action models from a few annotations. We demonstrate our approach on a hand gesture dataset with 500 action sequences, and we show that compared to DBAN our algorithm achieves 6% improvement in accuracy.

[1]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[3]  Ramakant Nevatia,et al.  Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Cristian Sminchisescu,et al.  Conditional Random Fields for Contextual Human Motion Recognition , 2005, ICCV.

[5]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Larry S. Davis,et al.  Context and observation driven latent variable model for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Rémi Ronfard,et al.  Automatic Discovery of Action Taxonomies from Multiple Views , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Ramakant Nevatia,et al.  View and scale invariant action recognition using multiview shape-flow models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  David A. Forsyth,et al.  Searching Video for Complex Activities with Finite State Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Martial Hebert,et al.  Event Detection in Crowded Videos , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[14]  Larry S. Davis,et al.  Multi-Cue Exemplar-Based Nonparametric Model for Gesture Recognition , 2004, ICVGIP.

[15]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ramakant Nevatia,et al.  Human action recognition using a dynamic Bayesian action network with 2D part models , 2010, ICVGIP '10.

[17]  Ramakant Nevatia,et al.  Learning 3D action models from a few 2D videos for view invariant action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Ramakant Nevatia,et al.  Human Pose Tracking Using Multi-level Structured Models , 2006, ECCV.

[19]  Camillo J. Taylor,et al.  Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[20]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.