Hand Detection using Deformable Part Models on an Egocentric Perspective

The egocentric perspective is a recent perspective brought by new devices like the GoPro and Google Glass, which are becoming more available to the public. The hands are the most consistent objects in the egocentric perspective and they can represent more information about people and their activities, but the nature of the perspective and the ever changing shape of the hands makes them difficult to detect. Previous work has focused on indoor environments or controlled data since it brings simpler ways to approach it, but in this work we use data with changing background and variable illumination, which makes it more challenging. We use a Deformable Part Model based approach to generate hand proposals since it can handle the many gestures the hand can adopt and rivals other techniques on locating the hands while reducing the number of proposals. We also use the location where the hands appear and size in the image to reduce the number of detections. Finally, a CNN classifier is applied to remove the final false positives to generate the hand detections.

[1]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Matthias Rauterberg,et al.  A Dynamic Approach and a New Dataset for Hand-detection in First Person Vision , 2015, CAIP.

[3]  Santiago Manen,et al.  Prime Object Proposals with Randomized Prim's Algorithm , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  James M. Rehg,et al.  Learning to recognize objects in egocentric activities , 2011, CVPR 2011.

[5]  Shmuel Peleg,et al.  Temporal Segmentation of Egocentric Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Mathias Kölsch,et al.  Robust hand detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[7]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Cheng Li,et al.  Model Recommendation with Virtual Probes for Egocentric Hand Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Martial Hebert,et al.  Temporal segmentation and activity classification from first-person sensing , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Joo-Hwee Lim,et al.  Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  James M. Rehg,et al.  Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Larry H. Matthies,et al.  First-Person Activity Recognition: Feature, Temporal Structure, and Prediction , 2015, International Journal of Computer Vision.

[19]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[20]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[21]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Stefan Lee,et al.  Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Stefan Lee,et al.  This Hand Is My Hand: A Probabilistic Approach to Hand Disambiguation in Egocentric Video , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[24]  Larry H. Matthies,et al.  First-Person Activity Recognition: What Are They Doing to Me? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Ali Borji,et al.  Egocentric Height Estimation , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26]  Rainer Groh,et al.  Aughanded Virtuality - the hands in the virtual environment , 2015, 2015 IEEE Symposium on 3D User Interfaces (3DUI).

[27]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[28]  Alejandro Betancourt,et al.  A Sequential Classifier for Hand Detection in the Framework of Egocentric Vision , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[29]  Weiqiang Wang,et al.  Egocentric hand detection via region growth , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[30]  Xiaofeng Ren,et al.  Figure-ground segmentation improves handled object recognition in egocentric video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Matthias Rauterberg,et al.  Left/Right Hand Segmentation in Egocentric Videos , 2016, 1607.06264.

[32]  Joo-Hwee Lim,et al.  Efficient Retrieval from Large-Scale Egocentric Visual Data Using a Sparse Graph Representation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.