Contextual Attention for Hand Detection in the Wild

We present Hand-CNN, a novel convolutional network architecture for detecting hand masks and predicting hand orientations in unconstrained images. Hand-CNN extends MaskRCNN with a novel attention mechanism to incorporate contextual cues in the detection process. This attention mechanism can be implemented as an efficient network module that captures non-local dependencies between features. This network module can be inserted at different stages of an object detection network, and the entire detector can be trained end-to-end. We also introduce large-scale annotated hand datasets containing hands in unconstrained images for training and evaluation. We show that Hand-CNN outperforms existing methods on the newly collected datasets and the publicly available PASCAL VOC human layout dataset. Data and code: https://www3.cs.stonybrook.edu/~cvl/projects/hand_det_attention/

[1]  Andrew Zisserman,et al.  Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts , 2008, BMVC.

[2]  Richard Bowden,et al.  Large Lexicon Detection of Sign Language , 2007, ICCV-HCI.

[3]  Oliver Deussen,et al.  Estimating 2D Multi-hand Poses from Single Depth Images , 2018, ECCV Workshops.

[4]  Yinda Zhang,et al.  Joint Hand Detection and Rotation Estimation by Using CNN , 2016, ArXiv.

[5]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Jitendra Malik,et al.  Deformable part models are convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ai Poh Loh,et al.  Attention Based Detection and Recognition of Hand Postures Against Complex Backgrounds , 2012, International Journal of Computer Vision.

[8]  Marios Savvides,et al.  Multiple Scale Faster-RCNN Approach to Driver’s Cell-Phone Usage and Hands on Steering Wheel Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Minh Hoai,et al.  BusyHands: A Hand-Tool Interaction Database for Assembly Tasks Semantic Segmentation , 2019, ArXiv.

[10]  Andrew Zisserman,et al.  Hand detection using multiple proposals , 2011, BMVC.

[11]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[12]  Thomas Brox,et al.  Learning to Estimate 3D Hand Pose from Single RGB Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[14]  Richard Bowden,et al.  A boosted classifier tree for hand shape detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[15]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[16]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Stefan Lee,et al.  Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Andrew Zisserman,et al.  Talking Heads: Detecting Humans and Recognizing Their Interactions , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Andrew Zisserman,et al.  Thread-Safe: Towards Recognizing Human Actions Across Shot Boundaries , 2014, ACCV.

[21]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[22]  Vincent Lepetit,et al.  HandSeg: A Dataset for Hand Segmentation from Depth Images , 2017, ArXiv.

[23]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Minh Hoai,et al.  WorkingHands: A Hand-Tool Assembly Dataset for Image Segmentation and Activity Mining , 2019, BMVC.

[26]  Shuicheng Yan,et al.  A2-Nets: Double Attention Networks , 2018, NeurIPS.

[27]  Ron Kimmel,et al.  Rule of thumb: Deep derotation for improved fingertip detection , 2015, BMVC.

[28]  James M. Rehg,et al.  Delving into egocentric actions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Rajiv Ranjan Sahay,et al.  Deep Learning Based Hand Detection in Cluttered Environment Using Skin Segmentation , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[30]  Andrea Tagliasacchi,et al.  HandSeg: An Automatically Labeled Dataset for Hand Segmentation from Depth Images , 2017, 2019 16th Conference on Computer and Robot Vision (CRV).

[31]  Mathias Kölsch,et al.  Robust hand detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[32]  Andrew Zisserman,et al.  Efficient discriminative learning of parts-based models , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  Ying Wu,et al.  An Adaptive Self-Organizing Color Segmentation Algorithm with Application to Robust Real-time Human Hand Localization , 2000 .

[34]  Shimon Ullman,et al.  The chains model for detecting parts by their context , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Alexander H. Waibel,et al.  Segmenting hands of arbitrary color , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[36]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).