Exemplar-based human action pose correction and tagging

The launch of Xbox Kinect has built a very successful computer vision product and made a big impact to the gaming industry; this sheds lights onto a wide variety of potential applications related to action recognition. The accurate estimation of human poses from the depth image is universally a critical step. However, existing pose estimation systems exhibit failures when faced severe occlusion. In this paper, we propose an exemplar-based method to learn to correct the initially estimated poses. We learn an inhomogeneous systematic bias by leveraging the exemplar information within specific human action domain. Our algorithm is illustrated on both joint-based skeleton correction and tag prediction. In the experiments, significant improvement is observed over the contemporary approaches, including what is delivered by the current Kinect system.

[1]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[3]  Hyeong-Seok Ko,et al.  A physically-based motion retargeting filter , 2005, TOGS.

[4]  J. Concato,et al.  A simulation study of the number of events per variable in logistic regression analysis. , 1996, Journal of clinical epidemiology.

[5]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Pietro Perona,et al.  Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Sung Yong Shin,et al.  Motion fairing , 1996, Proceedings Computer Animation '96.

[8]  Jinxiang Chai,et al.  Example-Based Human Motion Denoising , 2010, IEEE Transactions on Visualization and Computer Graphics.

[9]  Allan Birnbaum,et al.  A UNIFIED THEORY OF ESTIMATION , 1961 .

[10]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[13]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[16]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[17]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[18]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[19]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[20]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.