An Integrative Framework of Human Hand Gesture Segmentation for Human–Robot Interaction

This paper proposes a novel framework to segment hand gestures in RGB-depth (RGB-D) images captured by Kinect using humanlike approaches for human–robot interaction. The goal is to reduce the error of Kinect sensing and, consequently, to improve the precision of hand gesture segmentation for robot NAO. The proposed framework consists of two main novel approaches. First, the depth map and RGB image are aligned by using the genetic algorithm to estimate key points, and the alignment is robust to uncertainties of the extracted point numbers. Then, a novel approach is proposed to refine the edge of the tracked hand gestures in RGB images by applying a modified expectation–maximization (EM) algorithm based on Bayesian networks. The experimental results demonstrate that the proposed alignment method is capable of precisely matching the depth maps with RGB images, and the EM algorithm further effectively adjusts the RGB edges of the segmented hand gestures. The proposed framework has been integrated and validated in a system of human–robot interaction to improve NAO robot's performance of understanding and interpretation.

[1]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[2]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[3]  Honghai Liu,et al.  A Unified Fuzzy Framework for Human-Hand Motion Recognition , 2011, IEEE Transactions on Fuzzy Systems.

[4]  Zhengyou Zhang,et al.  Flexible camera calibration by viewing a plane from unknown orientations , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Kai Oliver Arras,et al.  People detection in RGB-D data , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Marcus A. Magnor,et al.  Markerless Motion Capture using multiple Color-Depth Sensors , 2011, VMV.

[7]  Anupam Agrawal,et al.  Vision based hand gesture recognition for human computer interaction: a survey , 2012, Artificial Intelligence Review.

[8]  Yi Li,et al.  Hand gesture recognition using Kinect , 2012, 2012 IEEE International Conference on Computer Science and Automation Engineering.

[9]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[10]  Wenkai Xu,et al.  Gesture Recognition based on 2 D and 3 D Feature by using Kinect Device , 2012 .

[11]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[12]  Kourosh Khoshelham,et al.  Accuracy analysis of kinect depth data , 2012 .

[13]  Yuan Yao,et al.  Contour Model-Based Hand-Gesture Recognition Using the Kinect Sensor , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Sander Oude Elberink,et al.  Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications , 2012, Sensors.

[15]  Eero P. Simoncelli,et al.  Optimally Rotation-Equivariant Directional Derivative Kernels , 1997, CAIP.

[16]  Luc Van Gool,et al.  Real-time 3D hand gesture interaction with a robot for understanding directions from humans , 2011, 2011 RO-MAN.

[17]  Ling Shao,et al.  Human action segmentation and recognition via motion and shape analysis , 2012, Pattern Recognit. Lett..

[18]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[19]  Kourosh Khoshelham,et al.  Automated localization of a laser scanner in indoor environments using planar objects , 2010, 2010 International Conference on Indoor Positioning and Indoor Navigation.

[20]  Chi-Man Pun,et al.  Hand gesture recognition with motion tracking on spatial-temporal filtering , 2011, VRCAI.

[21]  Janne Heikkilä,et al.  Geometric Camera Calibration Using Circular Control Points , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[23]  Mengyin Fu,et al.  Teleoperation of a virtual iCub robot under framework of parallel system via hand gesture recognition , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[24]  Honghai Liu,et al.  A novel approach to extract hand gesture feature in depth images , 2016, Multimedia Tools and Applications.

[25]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[26]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[27]  Thad Starner,et al.  American sign language recognition with the kinect , 2011, ICMI '11.

[28]  Tomás Pajdla,et al.  3D with Kinect , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[29]  Matthew Tang,et al.  Recognizing Hand Gestures with Microsoft ’ s Kinect , 2011 .

[30]  Honghai Liu,et al.  Fuzzy Gaussian Mixture Models , 2012, Pattern Recognit..

[31]  Luiz Velho,et al.  Kinect and RGBD Images: Challenges and Applications , 2012, 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials.

[32]  James M. Keller,et al.  A system for change detection and human recognition in voxel space using the Microsoft Kinect sensor , 2011, 2011 IEEE Applied Imagery Pattern Recognition Workshop (AIPR).

[33]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[34]  Lihong Zheng,et al.  Gesture Recognition from One Example Using Depth Images , 2013 .

[35]  Michal Havlena Towards Robot Localization and Obstacle Avoidance from Nao Camera , 2010 .

[36]  Mauro Donadeo,et al.  Combining multiple depth-based descriptors for hand gesture recognition , 2014, Pattern Recognit. Lett..

[37]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[38]  Claudia Linnhoff-Popien,et al.  Gait Recognition with Kinect , 2012 .

[39]  Daniel Herrera C,et al.  Joint depth and color camera calibration with distortion correction. , 2012, IEEE transactions on pattern analysis and machine intelligence.

[40]  Wolfram Burgard,et al.  Towards a benchmark for RGB-D SLAM evaluation , 2011, RSS 2011.

[41]  Qionghai Dai,et al.  Performance Capture of Interacting Characters with Handheld Kinects , 2012, ECCV.

[42]  Junsong Yuan,et al.  Robust hand gesture recognition based on finger-earth mover's distance with a commodity depth camera , 2011, ACM Multimedia.

[43]  Ankit Chaudhary,et al.  Tracking of Fingertips and Centers of Palm Using KINECT , 2011, 2011 Third International Conference on Computational Intelligence, Modelling & Simulation.

[44]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[45]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time dynamic 3D surface reconstruction and interaction , 2011, SIGGRAPH '11.

[46]  Honghai Liu,et al.  Real-time hand gesture feature extraction using depth data , 2014, 2014 International Conference on Machine Learning and Cybernetics.