Multi-scale prediction for robust hand detection and classification

In this paper, we present a multi-scale Fully Convolutional Networks (MSP-RFCN) to robustly detect and classify human hands under various challenging conditions. In our approach, the input image is passed through the proposed network to generate score maps, based on multi-scale predictions. The network has been specifically designed to deal with small objects. It uses an architecture based on region proposals generated at multiple scales. Our method is evaluated on challenging hand datasets, namely the Vision for Intelligent Vehicles and Applications (VIVA) Challenge and the Oxford hand dataset. It is compared against recent hand detection algorithms. The experimental results demonstrate that our proposed method achieves state-of-the-art detection for hands of various sizes.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[3]  Stefan Lee,et al.  Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[5]  Mohan M. Trivedi,et al.  On Performance Evaluation of Driver Hand Detection Algorithms: Challenges, Dataset, and Metrics , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[6]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[7]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[8]  Tian Zhou,et al.  Hierarchical context-aware hand detection algorithm for naturalistic driving , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[9]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Marios Savvides,et al.  Robust hand detection in Vehicles , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[11]  Marios Savvides,et al.  Robust Hand Detection and Classification in Vehicles and in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Andrew Zisserman,et al.  Hand detection using multiple proposals , 2011, BMVC.

[13]  Mohan M. Trivedi,et al.  Beyond just keeping hands on the wheel: Towards visual interpretation of driver hand motion patterns , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[14]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xiaoyin Xu,et al.  SqueezeMap: Fast Pedestrian Detection on a Low-Power Automotive Processor Using Efficient Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[21]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.