An embedded implementation of CNN-based hand detection and orientation estimation algorithm

AbstractHand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at https://github.com/yangli18/hand_detection) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.

[1]  Qing Chen,et al.  Hand Gesture Recognition Using Haar-Like Features and a Stochastic Context-Free Grammar , 2008, IEEE Transactions on Instrumentation and Measurement.

[2]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[3]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[4]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[5]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrew Zisserman,et al.  Hand detection using multiple proposals , 2011, BMVC.

[7]  Nikos A. Nikolaou,et al.  Real time hand detection in a complex background , 2014, Eng. Appl. Artif. Intell..

[8]  Cheng Wang,et al.  CNN-based object detection solutions for embedded heterogeneous multicore SoCs , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[9]  Manolis I. A. Lourakis,et al.  Real-Time Tracking of Multiple Skin-Colored Objects with a Possibly Moving Camera , 2004, ECCV.

[10]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[12]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Yinda Zhang,et al.  Joint Hand Detection and Rotation Estimation by Using CNN , 2016, ArXiv.

[15]  Yu Wang,et al.  Towards Real-Time Object Detection on Embedded Systems , 2018, IEEE Transactions on Emerging Topics in Computing.

[16]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Yichao Huang,et al.  A Pointing Gesture Based Egocentric Interaction System: Dataset, Approach and Application , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18]  Ai Poh Loh,et al.  Attention Based Detection and Recognition of Hand Postures Against Complex Backgrounds , 2012, International Journal of Computer Vision.

[19]  Marios Savvides,et al.  Robust hand detection in Vehicles , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[20]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jitendra Malik,et al.  Beyond Skip Connections: Top-Down Modulation for Object Detection , 2016, ArXiv.

[22]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[23]  Song Yao,et al.  Real-time object detection towards high power efficiency , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[24]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[25]  Marios Savvides,et al.  Robust Hand Detection and Classification in Vehicles and in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .