Tracking Human-like Natural Motion Using Deep Recurrent Neural Networks

Kinect skeleton tracker is able to achieve considerable human body tracking performance in convenient and a low-cost manner. However, The tracker often captures unnatural human poses such as discontinuous and vibrated motions when self-occlusions occur. A majority of approaches tackle this problem by using multiple Kinect sensors in a workspace. Combination of the measurements from different sensors is then conducted in Kalman filter framework or optimization problem is formulated for sensor fusion. However, these methods usually require heuristics to measure reliability of measurements observed from each Kinect sensor. In this paper, we developed a method to improve Kinect skeleton using single Kinect sensor, in which supervised learning technique was employed to correct unnatural tracking motions. Specifically, deep recurrent neural networks were used for improving joint positions and velocities of Kinect skeleton, and three methods were proposed to integrate the refined positions and velocities for further enhancement. Moreover, we suggested a novel measure to evaluate naturalness of captured motions. We evaluated the proposed approach by comparison with the ground truth obtained using a commercial optical maker-based motion capture system.

[1]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[2]  Hendrik Van Brussel,et al.  Human-inspired robot assistant for fast point-to-point movements , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[3]  Daniel Cremers,et al.  Real-time human motion tracking using multiple depth cameras , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  T. Flash,et al.  The coordination of arm movements: an experimentally confirmed mathematical model , 1985, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[5]  Xuan Song,et al.  Unsupervised skeleton extraction and motion capture from 3D deformable matching , 2013, Neurocomputing.

[6]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[7]  Charlie C. L. Wang,et al.  Improved Skeleton Tracking by Duplex Kinects: A Practical Approach for Real-Time Applications , 2013, J. Comput. Inf. Sci. Eng..

[8]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[9]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[10]  Frédéric Lerasle,et al.  Human Motion Capture Using Data Fusion of Multiple Skeleton Data , 2013, ACIVS.

[11]  Hans-Peter Seidel,et al.  Markerless Motion Capture of Multiple Characters Using Multiview Image Segmentation , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Nicolas Le Roux,et al.  Deep Belief Networks Are Compact Universal Approximators , 2010, Neural Computation.

[14]  Mohan M. Trivedi,et al.  Understanding human interactions with track and body synergies (TBS) captured from multiple views , 2008, Comput. Vis. Image Underst..

[15]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[18]  Julius Ziegler,et al.  Tracking of the Articulated Upper Body on Multi-View Stereo Image Sequences , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Dariu Gavrila,et al.  Multi-view 3D Human Pose Estimation in Complex Environment , 2011, International Journal of Computer Vision.

[20]  Weihua Sheng,et al.  Using human motion estimation for human-robot cooperative manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Yoshua Bengio,et al.  Shallow vs. Deep Sum-Product Networks , 2011, NIPS.