Kinect Posture Reconstruction Based on a Local Mixture of Gaussian Process Models

Depth sensor based 3D human motion estimation hardware such as Kinect has made interactive applications more popular recently. However, it is still challenging to accurately recognize postures from a single depth camera due to the inherently noisy data derived from depth images and self-occluding action performed by the user. In this paper, we propose a new real-time probabilistic framework to enhance the accuracy of live captured postures that belong to one of the action classes in the database. We adopt the Gaussian Process model as a prior to leverage the position data obtained from Kinect and marker-based motion capture system. We also incorporate a temporal consistency term into the optimization framework to constrain the velocity variations between successive frames. To ensure that the reconstructed posture resembles the accurate parts of the observed posture, we embed a set of joint reliability measurements into the optimization framework. A major drawback of Gaussian Process is its cubic learning complexity when dealing with a large database due to the inverse of a covariance matrix. To solve the problem, we propose a new method based on a local mixture of Gaussian Processes, in which Gaussian Processes are defined in local regions of the state space. Due to the significantly decreased sample size in each local Gaussian Process, the learning time is greatly reduced. At the same time, the prediction speed is enhanced as the weighted mean prediction for a given sample is determined by the nearby local models only. Our system also allows incrementally updating a specific local Gaussian Process in real time, which enhances the likelihood of adapting to run-time postures that are different from those in the database. Experimental results demonstrate that our system can generate high quality postures even under severe self-occlusion situations, which is beneficial for real-time applications such as motion-based gaming and sport training.

[1]  Cristian Sminchisescu,et al.  Fast algorithms for large scale conditional 3D prediction , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Taku Komura,et al.  A Virtual Reality Dance Training System Using Motion Capture Technology , 2011, IEEE Transactions on Learning Technologies.

[3]  Yeongho Seol,et al.  Human motion reconstruction from sparse 3D motion sensors using kernel CCA‐based regression , 2013, Comput. Animat. Virtual Worlds.

[4]  Liefeng Bo,et al.  Structured output-associative regression , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Hubert P. H. Shum,et al.  Posture reconstruction using Kinect with a probabilistic model , 2014, VRST '14.

[6]  Taehyun Rhee,et al.  Realtime human motion control with a small number of inertial sensors , 2011, SI3D.

[7]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[8]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[9]  Hubert P. H. Shum,et al.  Real-Time Posture Reconstruction for Microsoft Kinect , 2013, IEEE Transactions on Cybernetics.

[10]  David J. Fleet,et al.  Shared Kernel Information Embedding for discriminative inference , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Maria Pateraki,et al.  Robust Model-Based 3D Torso Pose Estimation in RGB-D Sequences , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[12]  Jessica K. Hodgins,et al.  Performance animation from low-dimensional control signals , 2005, SIGGRAPH 2005.

[13]  Baining Guo,et al.  Exemplar-based human action pose correction and tagging , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  M. Seeger Low Rank Updates for the Cholesky Decomposition , 2004 .

[15]  Hubert P. H. Shum,et al.  Real-time physical modelling of character movements with microsoft kinect , 2012, VRST '12.

[16]  Ludovic Hoyet,et al.  Push it real , 2012, ACM Trans. Graph..

[17]  Björn Krüger,et al.  Model based full body human motion reconstruction from video data , 2013, MIRAGE '13.

[18]  Varun Ramakrishna,et al.  Pose Machines: Articulated Pose Estimation via Inference Machines , 2014, ECCV.

[19]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  G. Broll,et al.  Microsoft Corporation , 1999 .

[21]  Zoubin Ghahramani,et al.  Local and global sparse Gaussian process approximations , 2007, AISTATS.

[22]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Cristian Sminchisescu,et al.  Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[24]  Hans-Peter Seidel,et al.  Real-Time Body Tracking with One Depth Camera and Inertial Sensors , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Jana Abhijit Kinect for Windows SDK Programming Guide , 2012 .

[26]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[27]  Michael J. Black,et al.  Learning image statistics for Bayesian tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[28]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[29]  Bobby Bodenheimer,et al.  A comparison of motion capture data recorded from a Vicon system and a Microsoft Kinect sensor , 2012, SAP '12.

[30]  Jinxiang Chai,et al.  Accurate realtime full-body motion capture using a single depth camera , 2012, ACM Trans. Graph..

[31]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[32]  Volker Tresp,et al.  Mixtures of Gaussian Processes , 2000, NIPS.

[33]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[34]  Duy Nguyen-Tuong,et al.  Local Gaussian Process Regression for Real Time Online Model Learning , 2008, NIPS.

[35]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[36]  Stefan Schaal,et al.  From Isolation to Cooperation: An Alternative View of a System of Experts , 1995, NIPS.

[37]  Judy M. Vance,et al.  Poster: Rapid development of natural user interaction using kinect sensors and VRPN , 2014, 2014 IEEE Symposium on 3D User Interfaces (3DUI).

[38]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[39]  Ivan Tashev Kinect Development Kit: A Toolkit for Gesture- and Speech-Based Human-Machine Interaction [Best of the Web] , 2013, IEEE Signal Processing Magazine.

[40]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[42]  Yun Fu,et al.  Human Motion Tracking by Temporal-Spatial Local Gaussian Process Experts , 2011, IEEE Transactions on Image Processing.

[43]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.