A Pointing Gesture Based Egocentric Interaction System: Dataset, Approach and Application

With the heated trend of augmented reality (AR) and popularity of smart head-mounted devices, the development of natural human device interaction is important, especially the hand gesture based interaction. This paper presents a solution for the point gesture based interaction in the egocentric vision and its application. Firstly, a dataset named EgoFinger is established focusing on the pointing gesture for the egocentric vision. We discuss the dataset collection detail and as well the comprehensive analysis of this dataset, including background and foreground color distribution, hand occurrence likelihood, scale and pointing angle distribution of hand and finger, and the manual labeling error analysis. The analysis shows that the dataset covers substantial data samples in various environments and dynamic hand shapes. Furthermore, we propose a two-stage Faster R-CNN based hand detection and dual-target fingertip detection framework. Comparing with state-of-art tracking and detection algorithm, it performs the best in both hand and fingertip detection. With the large-scale dataset, we achieve fingertip detection error at about 12.22 pixels in 640px × 480px video frame. Finally, using the fingertip detection result, we design and implement an input system for the egocentric vision, i.e., Ego-Air-Writing. By considering the fingertip as a pen, the user with wearable glass can write character in the air and interact with system using simple hand gestures.

[1]  Matthias Rauterberg,et al.  Filtering SVM frame-by-frame binary classification in a detection framework , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[2]  Stefan Lee,et al.  Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Dit-Yan Yeung,et al.  Understanding and Diagnosing Visual Tracking Systems , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Katsuhiko Sakaue,et al.  The Hand Mouse: GMM hand-color classification and mean shift tracking , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[5]  Lianwen Jin,et al.  Building compact MQDF classifier for large character set recognition by subspace distribution sharing , 2008, Pattern Recognit..

[6]  M. Bindemann Scene and screen center bias early eye movements in scene viewing , 2010, Vision Research.

[7]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[9]  Yi Yang,et al.  Depth-Based Hand Pose Estimation: Data, Methods, and Challenges , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  David W. Murray,et al.  Interaction between hand and wearable camera in 2d and 3d environments , 2004, BMVC.

[11]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Yichao Huang,et al.  DeepFinger: A Cascade Convolutional Neuron Network Approach to Finger Key Point Detection in Egocentric Vision with Mobile Camera , 2015, 2015 IEEE International Conference on Systems, Man, and Cybernetics.

[13]  Deva Ramanan,et al.  Understanding Everyday Hands in Action from RGB-D Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Shi-Min Hu,et al.  Global Contrast Based Salient Region Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  L. Itti,et al.  Quantifying center bias of observers in free viewing of dynamic natural scenes. , 2009, Journal of vision.

[16]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Yichao Huang,et al.  Fingertip in the Eye: A cascaded CNN pipeline for the real-time fingertip detection in egocentric videos , 2015, ArXiv.

[19]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Cheng Li,et al.  Model Recommendation with Virtual Probes for Egocentric Hand Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[22]  Adolfo López,et al.  Real-time fingertip localization conditioned on hand gesture classification , 2014, Image Vis. Comput..

[23]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[25]  Andrew Zisserman,et al.  Hand detection using multiple proposals , 2011, BMVC.

[26]  Luca Benini,et al.  Gesture Recognition in Ego-centric Videos Using Dense Trajectories and Hand Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.