Human Pose Tracking from RGB Inputs

In the context of Virtual and Augmented Reality, in order to allow systems to provide natural interaction through gestures and general understanding of user body behavior it is fundamental to obtain the configuration of human poses. Once achieved, the goal of obtaining such poses from RGB images through cameras brings the possibility of a wide range of applications in the areas of security (i.e.: local activity monitoring), healthcare (i.e.: postural analysis) and entertainment (i.e.: games and animations motion capture). However, the acquisition of human poses solely through RGB images is still considered a challenge, once that pure visual data doesnt explicitly give us information about the human body joints (keypoints in pixels) localization in the image. In this work we propose the a machine learning method, more specifically deep learning based on convolutional neural networks, capable of tackling this problem.

[1]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[2]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[3]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[4]  Christian Theobalt,et al.  Single-Shot Multi-Person 3D Body Pose Estimation From Monocular RGB Input , 2017, ArXiv.

[5]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[8]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[9]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[10]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[15]  Jitendra Malik,et al.  Recovering 3D human body configurations using shape contexts , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.