Real-Time Hand Gesture Detection and Recognition Using Bag-of-Features and Support Vector Machine Techniques

This paper presents a novel and real-time system for interaction with an application or video game via hand gestures. Our system includes detecting and tracking bare hand in cluttered background using skin detection and hand posture contour comparison algorithm after face subtraction, recognizing hand gestures via bag-of-features and multiclass support vector machine (SVM) and building a grammar that generates gesture commands to control an application. In the training stage, after extracting the keypoints for every training image using the scale invariance feature transform (SIFT), a vector quantization technique will map keypoints from every training image into a unified dimensional histogram vector (bag-of-words) after K-means clustering. This histogram is treated as an input vector for a multiclass SVM to build the training classifier. In the testing stage, for every frame captured from a webcam, the hand is detected using our algorithm, then, the keypoints are extracted for every small image that contains the detected hand gesture only and fed into the cluster model to map them into a bag-of-words vector, which is finally fed into the multiclass SVM training classifier to recognize the hand gesture.

[1]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[2]  Richard Bowden,et al.  View-based Location and Tracking of Body Parts for Visual Interaction , 2004, BMVC.

[3]  Sébastien Marcel,et al.  Hand posture recognition in a body-face centered space , 1999, CHI Extended Abstracts.

[4]  Emil M. Petriu,et al.  Hand gesture recognition using Bag-of-features and multi-class Support Vector Machine , 2010, 2010 IEEE International Symposium on Haptic Audio Visual Environments and Games.

[5]  Ying Wu,et al.  Capturing natural hand articulation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[6]  Björn Stenger,et al.  Template-Based Hand Pose Recognition Using Multiple Cues , 2006, ACCV.

[7]  Yangsheng Xu,et al.  A realtime hand gesture recognition based on Haar wavelet representation , 2009, 2008 IEEE International Conference on Robotics and Biomimetics.

[8]  Lars Bretzner,et al.  Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[9]  Rafael C. González,et al.  Digital image processing using MATLAB , 2006 .

[10]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[11]  Emil M. Petriu,et al.  A Prototype for 3-D Hand Tracking and Posture Estimation , 2008, IEEE Transactions on Instrumentation and Measurement.

[12]  Lior Wolf,et al.  Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  David C. Hogg,et al.  Towards 3D hand tracking using a deformable model , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[14]  Hanqing Lu,et al.  A real-time hand gesture recognition method , 2007, 2011 International Conference on Electronics, Communications and Control (ICECC).

[15]  A. Barczak,et al.  Real-time hand tracking using a set of cooperative classifiers based on Haar-like features A ndre , 2005 .

[16]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[17]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[18]  Lei Shi,et al.  A Real Time Vision-Based Hand Gestures Recognition System , 2010, ISICA.

[19]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[20]  Zhang Peng,et al.  An Automatic Hand Gesture Recognition System Based on Viola-Jones Method and SVMs , 2009, 2009 Second International Workshop on Computer Science and Engineering.

[21]  Yu Ren,et al.  Real-Time Hand Gesture Recognition Based on Vision , 2010, Edutainment.

[22]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[23]  S. Govindarajulu,et al.  A Comparison of SIFT, PCA-SIFT and SURF , 2012 .

[24]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[26]  Silvio Savarese,et al.  Discriminative Object Class Models of Appearance and Shape by Correlatons , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Cristina Picus,et al.  Framework for a portable gesture interface , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[28]  Luo Juan,et al.  A comparison of SIFT, PCA-SIFT and SURF , 2009 .

[29]  Manolis I. A. Lourakis,et al.  Vision-Based Interpretation of Hand Gestures for Remote Control of a Computer Mouse , 2006, ECCV Workshop on HCI.

[30]  Andrew Gilbert,et al.  Fast realistic multi-action recognition using mined dense spatio-temporal features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[32]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[33]  Takeo Kanade,et al.  Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[34]  W. Kelly,et al.  Screening for Objectionable Images: A Review of Skin Detection Techniques , 2008, 2008 International Machine Vision and Image Processing Conference.

[35]  Thomas S. Huang,et al.  Tracking articulated hand motion with eigen dynamics analysis , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[36]  He Jun,et al.  A Real Time Face Detection Method in Human-Machine Interaction , 2008, 2008 2nd International Conference on Bioinformatics and Biomedical Engineering.

[37]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[38]  Ahmed M. Elgammal,et al.  Spatiotemporal pyramid representation for recognition of facial expressions and hand gestures , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[39]  F. Dadgostar,et al.  Real-time Hand Tracking based on Non-Invariant Features , 2005, 2005 IEEE Instrumentationand Measurement Technology Conference Proceedings.

[40]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[41]  Robert Marti,et al.  Which is the best way to organize/classify images by content? , 2007, Image Vis. Comput..

[42]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[44]  Qing Chen,et al.  Hand Gesture Recognition Using Haar-Like Features and a Stochastic Context-Free Grammar , 2008, IEEE Transactions on Instrumentation and Measurement.

[45]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[46]  R. K. Selvakumar,et al.  Skin Detection Using Color Pixel Classification with Application to Face Detection: A Comparative Study , 2007 .

[47]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[48]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[49]  Chong-Wah Ngo,et al.  Keyframe Retrieval by Keypoints: Can Point-to-Point Matching Help? , 2006, CIVR.

[50]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[51]  Shan Lu,et al.  Recognition of local features for camera-based sign language recognition system , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[52]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[53]  Stephen J. McKenna,et al.  A comparison of skin history and trajectory-based representation schemes for the recognition of user-specified gestures , 2004, Pattern Recognit..

[54]  Stefan Winkler,et al.  Color Space Conversions , 2013 .

[55]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[56]  Francis Quek,et al.  Comparison of five color models in skin pixel classification , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).

[57]  Cordelia Schmid,et al.  Spatial Weighting for Bag-of-Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[58]  Kwang-Ting Cheng,et al.  An adaptive skin model and its application to objectionable image filtering , 2004, MULTIMEDIA '04.

[59]  Christopher H. Messom,et al.  Stream processing for fast and efficient rotated Haar-like features using rotated integral images , 2009, Int. J. Intell. Syst. Technol. Appl..

[60]  N.D. Georganas,et al.  Real-time Vision-based Hand Gesture Recognition Using Haar-like Features , 2007, 2007 IEEE Instrumentation & Measurement Technology Conference IMTC 2007.

[61]  Alexander H. Waibel,et al.  Skin-Color Modeling and Adaptation , 1998, ACCV.

[62]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[63]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[64]  Mathias Kölsch,et al.  Analysis of rotational robustness of hand detection with a Viola-Jones detector , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[65]  Paulo R. S. Mendonça,et al.  Model-based 3D tracking of an articulated hand , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.