Hand disambiguation using attention neural networks in the egocentric perspective

With the development of wearable cameras a new environment has emerged, the egocentric perspective, and with it the computer vision task of detecting the hands and disambiguating them left from right. In order to address this challenge, we use an Attention Network with various egocentric hand properties to make the final classification. These hand features are inspired by the egocentric perspective and include the hand location in the image, the hand size, the fact there is at most only one object of each hand class and the probability of each hand to appear in the image. In addition, we use the YOLO object detector and its Tiny version to see their impact on the overall performance and speed, which is needed for wearable devices. Finally, we compare them with current object and hand detection approaches.

[1]  Antoni B. Chan,et al.  Is that my hand? An egocentric dataset for hand disambiguation , 2019, Image Vis. Comput..

[2]  Ali Borji,et al.  Analysis of Hand Segmentation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Antoni B. Chan,et al.  Hand Detection Using Zoomed Neural Networks , 2019, ICIAP.

[4]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[5]  Yoichi Sato,et al.  Assisting group activity analysis through hand detection and identification in multiple egocentric videos , 2019, IUI.

[6]  David F. Fouhey,et al.  Understanding Human Hands in Contact at Internet Scale , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[8]  Cheng Li,et al.  Model Recommendation with Virtual Probes for Egocentric Hand Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Hui-Bin Liu,et al.  An efficient hand detection method based on convolutional neural network , 2018, 2018 7th International Symposium on Next Generation Electronics (ISNE).

[11]  Jirapat Likitlersuang,et al.  An Effective and Efficient Method for Detecting Hands in Egocentric Videos for Rehabilitation Applications , 2019, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[12]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Remco C. Veltkamp,et al.  Egocentric Hand Track and Object-Based Human Action Recognition , 2019, 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[14]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[17]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[18]  Xiaofeng Ren,et al.  Figure-ground segmentation improves handled object recognition in egocentric video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Michifumi Yoshioka,et al.  Using Machine Learning and a Combination of Respiratory Flow, Laryngeal Motion, and Swallowing Sounds to Classify Safe and Unsafe Swallowing , 2018, IEEE Transactions on Biomedical Engineering.

[20]  Mashiour Rahman,et al.  Toddler Sensory-Motor Development for Object Manipulation by Analyzing Hand-Pose , 2020, ICCA.

[21]  Analysis of the hands in egocentric vision: A survey , 2019, IEEE transactions on pattern analysis and machine intelligence.

[22]  Caifeng Shan,et al.  Deep Salient Object Detection With Contextual Information Guidance , 2020, IEEE Transactions on Image Processing.

[23]  Fahad Shahbaz Khan,et al.  Learning Rich Features at High-Speed for Single-Shot Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Stefan Lee,et al.  This Hand Is My Hand: A Probabilistic Approach to Hand Disambiguation in Egocentric Video , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[25]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[26]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Albert Dipanda,et al.  Hand pose estimation and tracking in real and virtual interaction: A review , 2019, Image Vis. Comput..

[28]  Tiejian Luo,et al.  Towards Interpretable and Robust Hand Detection via Pixel-wise Prediction , 2020, Pattern Recognit..

[29]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Daisuke Deguchi,et al.  Hand Orientation Estimation in Probability Density Form , 2019, ArXiv.

[31]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[32]  Sajad Mohamadzadeh,et al.  Estimation of Hand Skeletal Postures by Using Deep Convolutional Neural Networks , 2020 .

[33]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[34]  Mathias Kölsch,et al.  Robust hand detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[35]  James M. Rehg,et al.  Learning to recognize objects in egocentric activities , 2011, CVPR 2011.

[36]  Stefan Lee,et al.  Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Antoni B. Chan,et al.  Hand Detection using Deformable Part Models on an Egocentric Perspective , 2018, 2018 Digital Image Computing: Techniques and Applications (DICTA).

[38]  Fabrizio Nunnari,et al.  Simple and effective deep hand shape and pose regression from a single depth image , 2019, Comput. Graph..

[39]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Xin Wang,et al.  Few-Shot Object Detection via Feature Reweighting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Marc Pollefeys,et al.  H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).