Architecture and Parameter Analysis to Convolutional Neural Network for Hand Tracking

Currently, the hand tracking based on deep learning has made good progress, but these literatures have less influence on the tracking accuracy of Convolutional Neural Network (CNN) architecture and parameters. In this paper, we proposed a new method to analyze the influence factors of gesture tracking. Firstly, we establish the gesture image and corresponding gesture parameter database based on virtual 3D human hand, on which the convolutional neural network models are constructed, after that we research some related factors, such as network structure, iteration times, data augmentation and Dropout, etc., that affect the performance of hand tracking. Finally we evaluate the objective parameters of the virtual hand, and make the subjective evaluation of the real hand extracted in the video. The results show that, on the premise of the fixed training amount of the hand, the effect of increasing the number of convolutional cores or convolution layers on the accuracy of the real gesture is not obvious, the data augmentation is obvious. For the real gesture, when the number of iterations and the Dropout ratio is about 20%–30%, good results can be obtained. This work provides the foundation for future application research on hand tracking.

[1]  Yihong Gong,et al.  Discriminative learning of visual words for 3D human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[3]  Paolo Dario,et al.  A Survey of Glove-Based Systems and Their Applications , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[5]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jianxiong Xiao,et al.  Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[7]  Daniel Thalmann,et al.  Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Young J. Kim,et al.  Interactive generalized penetration depth computation for rigid and articulated models using object norm , 2014, ACM Trans. Graph..

[11]  Andrew W. Fitzgibbon,et al.  The Joint Manifold Model for Semi-supervised Multi-valued Regression , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Pascal Vincent,et al.  Dropout as data augmentation , 2015, ArXiv.

[15]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Karthik Ramani,et al.  DeepHand: Robust Hand Pose Estimation by Completing a Matrix Imputed with Deep Features , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Andrew W. Fitzgibbon,et al.  Accurate, Robust, and Flexible Real-time Hand Tracking , 2015, CHI.