Articulated and Generalized Gaussian Kernel Correlation for Human Pose Estimation

In this paper, we propose an articulated and generalized Gaussian kernel correlation (GKC)-based framework for human pose estimation. We first derive a unified GKC representation that generalizes the previous sum of Gaussians (SoG)-based methods for the similarity measure between a template and an observation both of which are represented by various SoG variants. Then, we develop an articulated GKC (AGKC) by integrating a kinematic skeleton in a multivariate SoG template that supports subject-specific shape modeling and articulated pose estimation for both the full body and the hands. We further propose a sequential (body/hand) pose tracking algorithm by incorporating three regularization terms in the AGKC function, including visibility, intersection penalty, and pose continuity. Our tracking algorithm is simple yet effective and computationally efficient. We evaluate our algorithm on two benchmark depth data sets. The experimental results are promising and competitive when compared with the state-of-the-art algorithms.

[1]  Baba C. Vemuri,et al.  Robust Point Set Registration Using Gaussian Mixture Models , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Andriy Myronenko,et al.  Point Set Registration: Coherent Point Drift , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Hans-Peter Seidel,et al.  Real-Time Hand Tracking Using a Sum of Anisotropic Gaussians Model , 2014, 2014 2nd International Conference on 3D Vision.

[4]  David Sweeney,et al.  Learning to be a depth camera for close-range human capture and interaction , 2014, ACM Trans. Graph..

[5]  Pascal Fua,et al.  Articulated Soft Objects for Multiview Shape and Motion Capture , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[9]  Shiloh L. Dockstader,et al.  Prediction for human motion tracking failures , 2006, IEEE Transactions on Image Processing.

[10]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Guoliang Fan,et al.  Generalized Sum of Gaussians for Real-Time Human Pose Tracking from a Single Depth Sensor , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[12]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ruigang Yang,et al.  Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Hans-Peter Seidel,et al.  Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[15]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[16]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[17]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[18]  Antonis A. Argyros,et al.  Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[19]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[20]  Luc Van Gool,et al.  Functional categorization of objects using real-time markerless motion capture , 2011, CVPR 2011.

[21]  Takeo Kanade,et al.  A correlation-based model prior for stereo , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[22]  David W. Scott,et al.  From Kernels to Mixtures , 2001, Technometrics.

[23]  Anand Rangarajan,et al.  Maximum Likelihood Wavelet Density Estimation With Applications to Image and Shape Matching , 2008, IEEE Transactions on Image Processing.

[24]  Christian Theobalt,et al.  Monocular Pose Capture with a Depth Camera Using a Sums-of-Gaussians Body Model , 2013, GCPR.

[25]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Quan-Sen Sun,et al.  Geodesic Invariant Feature: A Local Descriptor in Depth , 2015, IEEE Transactions on Image Processing.

[27]  Manolis Papadrakakis,et al.  A Hybrid Particle Swarm—Gradient Algorithm for Global Structural Optimization , 2010, Comput. Aided Civ. Infrastructure Eng..

[28]  Michael R. Lyu,et al.  A hybrid particle swarm optimization-back-propagation algorithm for feedforward neural network training , 2007, Appl. Math. Comput..

[29]  Antti Oulasvirta,et al.  Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[32]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[33]  Ruigang Yang,et al.  Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera. , 2016, IEEE transactions on pattern analysis and machine intelligence.

[34]  Radu Horaud,et al.  Human Motion Tracking by Registering an Articulated Surface to 3D Points and Normals , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Radu Horaud,et al.  Rigid and Articulated Point Registration with Expectation Conditional Maximization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[37]  Yue Joseph Wang,et al.  Information-theoretic matching of two point sets , 2002, IEEE Trans. Image Process..

[38]  Luc Van Gool,et al.  Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[39]  Guoliang Fan,et al.  Fast Human Pose Tracking with a Single Depth Sensor Using Sum of Gaussians Models , 2014, ISVC.

[40]  Hans-Peter Seidel,et al.  Real-Time Body Tracking with One Depth Camera and Inertial Sensors , 2013, 2013 IEEE International Conference on Computer Vision.

[41]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[42]  Jinxiang Chai,et al.  Accurate realtime full-body motion capture using a single depth camera , 2012, ACM Trans. Graph..

[43]  Ivor W. Tsang,et al.  A Hybrid PSO-BFGS Strategy for Global Optimization of Multimodal Functions , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[44]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[45]  Angel Domingo Sappa,et al.  The Richer Representation the Better Registration , 2013, IEEE Transactions on Image Processing.

[46]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[47]  Ruigang Yang,et al.  Accurate 3D pose estimation from a single depth image , 2011, 2011 International Conference on Computer Vision.

[48]  Antonis A. Argyros,et al.  Tracking the articulated motion of two strongly interacting hands , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[50]  Takeo Kanade,et al.  A Correlation-Based Approach to Robust Point Set Registration , 2004, ECCV.

[51]  Sebastian Thrun,et al.  Real-Time Human Pose Tracking from Range Data , 2012, ECCV.

[52]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..