Deep neural networks for human pose estimation from a very low resolution depth image

The work presented in the paper is dedicated to determining and evaluating the most efficient neural network architecture applied as a multiple regression network localizing human body joints in 3D space based on a single low resolution depth image. The main challenge was to deal with a noisy and coarse representation of the human body, as observed by a depth sensor from a large distance, and to achieve high localization precision. The regression network was expected to reason about relations of body parts based on depth image, and to extract locations of joints, and provide coordinates defining the body pose. The method involved creation of a dataset with 200,000 realistic depth images of a 3D body model, then training and testing numerous architectures including feedforward multilayer perceptron network and deep convolutional neural networks. The results of training and evaluation are included and discussed. The most accurate DNN network was further trained and evaluated on an augmented depth images dataset. The achieved accuracy was similar to a reference Kinect algorithm results, with a great benefit of fast processing speed and significantly lower requirements on sensor resolution, as it used 100 times less pixels than Kinect depth sensor. The method was robust against sensor noise, allowing imprecision of depth measurements. Finally, our results were compared with VGG, MobileNet, and ResNet architectures.

[1]  Jin-Hee Lee,et al.  ResNet-Based Vehicle Classification and Localization in Traffic Surveillance Systems , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[4]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[5]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Piotr Szczuko,et al.  ANN for human pose estimation in low resolution depth images , 2017, 2017 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA).

[9]  Gilson A. Giraldi,et al.  Hand gesture recognition from depth and infrared Kinect data for CAVE applications interaction , 2017, Multimedia Tools and Applications.

[10]  Antoni B. Chan,et al.  3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[11]  Shin'ichi Satoh,et al.  Human gesture recognition system for TV viewing using time-of-flight camera , 2011, Multimedia Tools and Applications.

[12]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Hui Cheng,et al.  Human Pose Estimation from Depth Images via Inference Embedded Multi-task Learning , 2016, ACM Multimedia.

[15]  Ruigang Yang,et al.  Accurate 3D pose estimation from a single depth image , 2011, 2011 International Conference on Computer Vision.

[16]  Michael Arens,et al.  Estimating Body Pose of Infants in Depth Images Using Random Ferns , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[17]  Majid Mirmehdi,et al.  Skeleton-Free Body Pose Estimation from Depth Images for Movement Analysis , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[18]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  René Vidal,et al.  3D Pose Regression Using Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Ling Shao,et al.  RGB-D datasets using microsoft kinect or similar sensors: a survey , 2017, Multimedia Tools and Applications.

[22]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[23]  Juan José Pantrigo,et al.  Real-time human body tracking based on data fusion from multiple RGB-D sensors , 2017, Multimedia Tools and Applications.

[24]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  René Vidal,et al.  Convolutional Networks for Object Category and 3D Pose Estimation from 2D Images , 2018, ECCV Workshops.

[26]  Ashutosh Saxena,et al.  Co-evolutionary predictors for kinematic pose inference from RGBD images , 2012, GECCO '12.

[27]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[28]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[29]  Piotr Szczuko CNN Architectures for Human Pose Estimation from a Very Low Resolution Depth Image , 2018, 2018 11th International Conference on Human System Interaction (HSI).

[30]  Jiyoung Park,et al.  Accurate and Efficient 3D Human Pose Estimation Algorithm Using Single Depth Images for Pose Analysis in Golf , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[32]  Timothy K. Shih,et al.  3D finger tracking and recognition image processing for real-time music playing with depth sensors , 2017, Multimedia Tools and Applications.

[33]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.