On-line human action recognition by combining joint tracking and key pose recognition

In this paper, we present a boosting approach by combining the pose estimation and the upper body tracking to on-line recognize human actions. Instead of using a predefined pose to initialize the human skeleton, we construct a key poses database with depth HOG features as searching indexes. When user enters the camera view, we automatically search the database to get the initial skeleton. Then we use the particle filter to track human upper body parts. At the same time, we feed the tracking joints into the hidden Markov models to on-line spot and recognize the performed action. In order to rectify tracking errors, we apply the action recognition results and reuse our key poses database to reinforce the tracking process. Our contributions of the proposed approach are three-fold. First, our method can recognize human poses and actions at the same time. Second, with the key poses database and action recognition results as the feedback, the tracking process becomes more efficient and accurate. Third, we propose a spotting method based on the gradient of HMM probabilities, which thus enables our method to achieve on-line spotting and recognition. Experimental results demonstrate the effectiveness of the proposed approach.

[1]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Mu-Chun Su,et al.  A hand-gesture-based control interface for a car-robot , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Heung-Il Suk,et al.  Hand gesture recognition based on dynamic Bayesian network framework , 2010, Pattern Recognit..

[4]  Václav Hlavác,et al.  Pose primitive based human action recognition in videos or still images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[6]  Rüdiger Dillmann,et al.  Fusion of 2d and 3d sensor data for articulated body tracking , 2009, Robotics Auton. Syst..

[7]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[8]  Stan Sclaroff,et al.  A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Xilin Chen,et al.  A unified framework for locating and recognizing human actions , 2011, CVPR 2011.

[10]  Pavel Krsek,et al.  Robust Euclidean alignment of 3D point sets: the trimmed iterative closest point algorithm , 2005, Image Vis. Comput..

[11]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[12]  Stefan Carlsson,et al.  Recognizing and Tracking Human Action , 2002, ECCV.

[13]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[14]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Daijin Kim,et al.  Simultaneous Gesture Segmentation and Recognition based on Forward Spotting Accumulative HMMs , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[17]  Li-Chen Fu,et al.  Visual tracking of human head and arms with a single camera , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Jin-Hyung Kim,et al.  An HMM-Based Threshold Model Approach for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[21]  Jessica K. Hodgins,et al.  Performance animation from low-dimensional control signals , 2005, SIGGRAPH 2005.

[22]  Yihong Gong,et al.  Latent Pose Estimator for Continuous Action Recognition , 2008, ECCV.

[23]  Mohiuddin Ahmad,et al.  HMM-based Human Action Recognition Using Multiview Image Sequences , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[24]  Edmond Boyer,et al.  Action recognition using exemplar-based embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.