Human 3D Pose Estimation with a Tilting Camera for Social Mobile Robot Interaction

Human–Robot interaction represents a cornerstone of mobile robotics, especially within the field of social robots. In this context, user localization becomes of crucial importance for the interaction. This work investigates the capabilities of wide field-of-view RGB cameras to estimate the 3D position and orientation (i.e., the pose) of a user in the environment. For that, we employ a social robot endowed with a fish-eye camera hosted in a tilting head and develop two complementary approaches: (1) a fast method relying on a single image that estimates the user pose from the detection of their feet and does not require either the robot or the user to remain static during the reconstruction; and (2) a method that takes some views of the scene while the camera is being tilted and does not need the feet to be visible. Due to the particular setup of the tilting camera, special equations for 3D reconstruction have been developed. In both approaches, a CNN-based skeleton detector (OpenPose) is employed to identify humans within the image. A set of experiments with real data validate our two proposed methods, yielding similar results than commercial RGB-D cameras while surpassing them in terms of coverage of the scene (wider FoV and longer range) and robustness to light conditions.

[1]  Albert A. Rizzo,et al.  Interactive game-based rehabilitation using the Microsoft Kinect , 2012, 2012 IEEE Virtual Reality Workshops (VRW).

[2]  Olivier D. Faugeras,et al.  Oriented Projective Geometry for Computer Vision , 1996, ECCV.

[3]  José-Raúl Ruiz-Sarmiento,et al.  Technical improvements of the Giraff telepresence robot based on users' evaluation , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[4]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Camillo J. Taylor,et al.  Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[6]  Cordelia Schmid,et al.  Image-Based Synthesis for Deep 3D Human Pose Estimation , 2018, International Journal of Computer Vision.

[7]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Manal Abdel Wahed,et al.  A Comparison of Virtual Rehabilitation Techniques , 2015 .

[9]  David J. Fleet,et al.  People tracking using hybrid Monte Carlo filtering , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Francisco Angel Moreno,et al.  Stereo vision specific models for particle filter-based SLAM , 2009, Robotics Auton. Syst..

[11]  Francisco-Angel Moreno,et al.  Analyzing interference between RGB-D cameras for human motion tracking , 2018 .

[12]  Amedeo Cesta,et al.  GiraffPlus: a system for monitoring activities and physiological parameters and promoting social interaction for elderly. , 2014 .

[13]  Edward Jones,et al.  Review of geometric distortion compensation in fish-eye cameras , 2008 .

[14]  T. Kanade,et al.  Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[15]  Wolfram Burgard,et al.  3D Human Pose Estimation in RGBD Images for Robotic Task Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Bodo Rosenhahn,et al.  Optical Flow-Based 3D Human Motion Estimation from Monocular Video , 2017, GCPR.

[17]  Michael A. Goodrich,et al.  Human-Robot Interaction: A Survey , 2008, Found. Trends Hum. Comput. Interact..

[18]  Amedeo Cesta,et al.  ExCITE Project: A Review of Forty-Two Months of Robotic Telepresence Technology Evolution , 2016, PRESENCE: Teleoperators and Virtual Environments.

[19]  Long Quan,et al.  FAST SEGMENTATION-BASED DENSE STEREO FROM QUASI-DENSE MATCHING , 2004 .

[20]  Antoni B. Chan,et al.  Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Michael J. Black,et al.  Lie Bodies: A Manifold Representation of 3D Human Shape , 2012, ECCV.

[22]  Reid G. Simmons,et al.  Natural person-following behavior for social robots , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[23]  Ahmed M. Elgammal,et al.  Tracking People on a Torus , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[25]  David J. Fleet,et al.  Temporal motion models for monocular and multiview 3D human body tracking , 2006, Comput. Vis. Image Underst..

[26]  Alina Delia Calin,et al.  Interchangeability of Kinect and Orbbec Sensors for Gesture Recognition , 2018, 2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP).

[27]  José-Raúl Ruiz-Sarmiento,et al.  Towards Long-Term Deployment of a Mobile Robot for at-Home Ambient Assisted Living of the Elderly , 2019, 2019 European Conference on Mobile Robots (ECMR).

[28]  Jidong Huang,et al.  Study on the use of Microsoft Kinect for robotics applications , 2012, Proceedings of the 2012 IEEE/ION Position, Location and Navigation Symposium.

[29]  David Kim,et al.  Shake'n'sense: reducing interference for overlapping structured light depth cameras , 2012, CHI.

[30]  ALINA CĂLIN,et al.  MIRA – UPPER LIMB REHABILITATION SYSTEM USING MICROSOFT KINECT , 2011 .

[31]  Antonios Gasteratos,et al.  Review of Stereo Vision Algorithms: From Software to Hardware , 2008 .

[32]  Pascal Monasse,et al.  Three-step image rectification , 2010, BMVC.

[33]  A. Cesta,et al.  Enabling Social Interaction Through Embodiment in Ex CITE , 2010 .

[34]  Jiansheng Li,et al.  RELATIVE CAMERA POSE ESTIMATION METHOD USING OPTIMIZATION ON THE MANIFOLD , 2017 .

[35]  Francisco Angel Moreno,et al.  A constant-time SLAM back-end in the continuum between global mapping and submapping: application to visual stereo SLAM , 2016, Int. J. Robotics Res..

[36]  Karsten Berns,et al.  A Multimodal Nonverbal Human-Robot Communication System , 2015 .

[37]  Javier Gonzalez Monroy,et al.  Socially acceptable approach to humans by a mobile robot , 2019, APPIS.

[38]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[39]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Sergio Escalera,et al.  A real-time Human-Robot Interaction system based on gestures for assistive scenarios , 2016, Comput. Vis. Image Underst..

[41]  Wenbing Zhao,et al.  A Survey of Applications and Human Motion Recognition with Microsoft Kinect , 2015, Int. J. Pattern Recognit. Artif. Intell..

[42]  Qionghai Dai,et al.  3D Pose Detection of Closely Interactive Humans Using Multi-View Cameras , 2019, Sensors.

[43]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[45]  Jordi Gonzàlez,et al.  Human Pose Estimation from Monocular Images: A Comprehensive Survey , 2016, Sensors.