Human arm pose modeling with learned features using joint convolutional neural network

This paper proposes a new approach to model human arm pose configuration from still images based on learned features and arm part structure constraints. The subjects in still images have no assumption with regards to clothing style, action category and background, so our model has to accommodate these uncertainties. Proposed approach uses an energy model that incorporates the dependence relationships among arm joints and arm parts, where the potentials represent their occurrence probabilities. Positive and negative instances are computed from input image, using multi-scale image patches to capture the details of arm joints and arm parts. A joint convolutional neural network is then developed for feature extraction. Local rigidity of arm part is used to constrain occurrence of arm joints and arm parts, and these constraints can be efficiently incorporated in dynamic programming for human arm pose inference. Our experimental results show better performance than alternative approaches using hand-crafted features for various still images.

[1]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Andrew Zisserman,et al.  Efficient discriminative learning of parts-based models , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Varun Ramakrishna,et al.  Pose Machines: Articulated Pose Estimation via Inference Machines , 2014, ECCV.

[4]  Jonathan Tompson,et al.  MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation , 2014, ACCV.

[5]  Andrew Zisserman,et al.  Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos , 2014, ACCV.

[6]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[8]  Ramin Zabih,et al.  Dynamic Programming and Graph Algorithms in Computer Vision , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[10]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[11]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[12]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[16]  Todd Mytkowicz,et al.  Parallel Stochastic Gradient Descent with Sound Combiners , 2017, ArXiv.

[17]  C W Hutton,et al.  Anatomy and Human Movement: Structure and Function , 1990 .

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[20]  Yann LeCun,et al.  Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[21]  Alan F. Smeaton,et al.  Detector adaptation by maximising agreement between independent data sources , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  N. Palastanga,et al.  Anatomy and Human Movement: Structure and Function , 1989 .

[23]  Chen Liu,et al.  Probabilistic Siamese Network for Learning Representations , 2013 .

[24]  Cordelia Schmid,et al.  Estimating Human Pose with Flowing Puppets , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[27]  Oliver Schreer,et al.  Real-time 3d arm pose estimation from monocular video for enhanced HCI , 2008, VNBA '08.

[28]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[29]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[30]  Ben Taskar,et al.  MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Thomas B. Moeslund,et al.  Modelling and estimating the pose of a human arm , 2003, Machine Vision and Applications.

[32]  Razvan Pascanu,et al.  Pylearn2: a machine learning research library , 2013, ArXiv.

[33]  Lu Wang,et al.  Bayesian 3D model based human detection in crowded scenes using efficient optimization , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[34]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[35]  Deva Ramanan,et al.  N-best maximal decoders for part models , 2011, 2011 International Conference on Computer Vision.

[36]  N. H. C. Yung,et al.  Action categorization based on arm pose modeling , 2015, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[37]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[38]  N. H. C. Yung,et al.  Categorization of human actions with high dynamics in upper extremities based on arm pose modeling , 2015, Machine Vision and Applications.

[39]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .