Analysis of hand segmentation on challenging hand over face scenario

One of the challenging problems in computer vision is hand segmentation, especially when the hands overlap with the face. There are many applications that require this type of segmentation, such as sign language recognition, action recognition and recognition of objects that hands interact with. Hand over face is a challenging scenario where faces are occluded by hands, that can be used to test the performance of hand segmentation methods. Not much work has been done on this topic. After analysis of related datasets for hand segmentation that include hands in front of or near to the face, we introduce our challenging public dataset for the hand-over-face segmentation problem. The new dataset contains 4384 annotated frames and includes color, depth, infrared streams recorded by Kinect. Additionally, hand(s) locations and shapes data using Leap Motion sensor, which is an infrared hand shape sensor, are included. We compare two leading semantic segmentation methods: SegNet [1] and RefineNet [12], to analyze the new dataset. Two experiments were executed: the first one for hand-background segmentation and the other one for right hand- left hand- background segmentation. RefineNet shows significantly better accuracy, 14% better than that of SegNet, on our new dataset. Nonetheless, the highest accuracy archived was 62.2%, demonstrating that VLM-HandOverFace1 is a challenging dataset for the current state of the art.

[1]  Rana El Kaliouby,et al.  Towards Communicative Face Occlusions: Machine Detection of Hand-over-Face Gestures , 2009, ICIAR.

[2]  Stefan Lee,et al.  Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[4]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  I. Maglogiannis,et al.  Ratsnake: A Versatile Image Annotation Tool with Application to Computer-Aided Diagnosis , 2014, TheScientificWorldJournal.

[7]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Seunghoon Hong,et al.  Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation , 2015, NIPS.

[9]  Ali Borji,et al.  Analysis of Hand Segmentation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  David B. Cooper,et al.  Accurately Estimating Sherd 3D Surface Geometry with Application to Pot Reconstruction , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[11]  Z. Liu,et al.  A real time system for dynamic hand gesture recognition with a depth sensor , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[12]  Peter Robinson,et al.  3D Corpus of Spontaneous Complex Mental States , 2011, ACII.

[13]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[14]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  David J. Kriegman,et al.  Localizing Parts of Faces Using a Consensus of Exemplars , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Julian F. P. Kooij SenseCap: Synchronized Data Collection with Microsoft Kinect2 and LeapMotion , 2016, ACM Multimedia.

[18]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[20]  James M. Rehg,et al.  Learning to recognize objects in egocentric activities , 2011, CVPR 2011.

[21]  Qi Ye,et al.  BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Matilde Gonzalez,et al.  Head Tracking and Hand Segmentation during Hand over Face Occlusion in Sign Language , 2010, ECCV Workshops.

[23]  Charles E. Hughes,et al.  Hand2Face: Automatic synthesis and recognition of hand over face occlusions , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).

[24]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.