Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion

Gesture recognition is a hot topic in computer vision and pattern recognition, which plays a vitally important role in natural human-computer interface. Although great progress has been made recently, fast and robust hand gesture recognition remains an open problem, since the existing methods have not well balanced the performance and the efficiency simultaneously. To bridge it, this work combines image entropy and density clustering to exploit the key frames from hand gesture video for further feature extraction, which can improve the efficiency of recognition. Moreover, a feature fusion strategy is also proposed to further improve feature representation, which elevates the performance of recognition. To validate our approach in a "wild" environment, we also introduce two new datasets called HandGesture and Action3D datasets. Experiments consistently demonstrate that our strategy achieves competitive results on Northwestern University, Cambridge, HandGesture and Action3D hand gesture datasets. Our code and datasets will release at this https URL.

[1]  Ananda S. Chowdhury,et al.  Video key frame extraction through dynamic Delaunay clustering with a structural constraint , 2013, J. Vis. Commun. Image Represent..

[2]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[3]  Gwen Littlewort,et al.  Towards Automated Understanding of Student-Tutor Interactions Using Visual Deictic Gestures , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[4]  Gwen Littlewort,et al.  Hand Gestures for Intelligent Tutoring Systems: Dataset, Techniques & Evaluation , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[5]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Dong Kyun Lim,et al.  A Novel Method of Determining Parameters of CLAHE Based on Image Entropy , 2013 .

[7]  Zhenhua Guo,et al.  A Completed Modeling of Local Binary Pattern Operator for Texture Classification , 2010, IEEE Transactions on Image Processing.

[8]  Pedro Neto,et al.  Real-time and continuous hand gesture spotting: An approach based on artificial neural networks , 2013, 2013 IEEE International Conference on Robotics and Automation.

[9]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[10]  J. Ross Beveridge,et al.  Tangent bundle for human action recognition , 2011, Face and Gesture 2011.

[11]  Mohan M. Trivedi,et al.  The Power Is in Your Hands: 3D Analysis of Hand Gestures in Naturalistic Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Ling Shao,et al.  Kernelized Multiview Projection for Robust Action Recognition , 2016, International Journal of Computer Vision.

[13]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Mohan M. Trivedi,et al.  Hand Gesture Recognition in Real Time for Automotive Interfaces: A Multimodal Vision-Based Approach and Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[15]  A. D. Brink,et al.  Using spatial information as an aid to maximum entropy image threshold selection , 1996, Pattern Recognit. Lett..

[16]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[17]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Pietro Zanuttigh,et al.  Hand gesture recognition with leap motion and kinect devices , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[19]  Roberto Cipolla,et al.  Real-time Interpretation of Hand Motions using a Sparse Bayesian Classifier on Motion Gradient Orientation Images , 2005, BMVC.

[20]  Hiroomi Hikawa,et al.  Novel FPGA Implementation of Hand Sign Recognition System With SOM–Hebb Classifier , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Tae-Kyun Kim,et al.  Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[23]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Ling Shao,et al.  Structure-Preserving Binary Representations for RGB-D Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Ananda S. Chowdhury,et al.  Scalable Video Summarization Using Skeleton Graph and Random Walk , 2014, 2014 22nd International Conference on Pattern Recognition.

[27]  Ling Shao,et al.  Motion Histogram Analysis Based Key Frame Extraction for Human Action/Activity Representation , 2009, 2009 Canadian Conference on Computer and Robot Vision.

[28]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[29]  J. Ross Beveridge,et al.  Action classification on product manifolds , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Kristen Grauman,et al.  FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Bingbing Ni,et al.  Zero-Shot Action Recognition with Error-Correcting Output Codes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[33]  Brian C. Lovell,et al.  Spatio-temporal covariance descriptors for action and gesture recognition , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[34]  Ling Shao,et al.  Synthesis of spatio-temporal descriptors for dynamic hand gesture recognition using genetic programming , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[35]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[36]  Bodo Rosenhahn,et al.  Real-Time Sign Language Recognition Using a Consumer Depth Camera , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[37]  Lu Tian,et al.  SDM-BSM: A fusing depth scheme for human action recognition , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[38]  Xiaolong Wang,et al.  Ship Detection for Complex Background SAR Images Based on a Multiscale Variance Weighted Image Entropy Method , 2017, IEEE Geoscience and Remote Sensing Letters.

[39]  Nicu Sebe,et al.  GestureGAN for Hand Gesture-to-Gesture Translation in the Wild , 2018, ACM Multimedia.

[40]  Matti Pietikäinen,et al.  Combining appearance and motion for face and gender recognition from videos , 2009, Pattern Recognit..

[41]  Luca Benini,et al.  Gesture Recognition in Ego-centric Videos Using Dense Trajectories and Hand Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[42]  Yuan Yao,et al.  Contour Model-Based Hand-Gesture Recognition Using the Kinect Sensor , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[43]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Junsong Yuan,et al.  Robust Part-Based Hand Gesture Recognition Using Kinect Sensor , 2013, IEEE Transactions on Multimedia.

[46]  Wei Hu,et al.  Automatic user state recognition for hand gesture based low-cost television control system , 2014, IEEE Transactions on Consumer Electronics.

[47]  Gang Hua,et al.  Dynamic hand gesture recognition: An exemplar-based approach from motion divergence fields , 2012, Image Vis. Comput..

[48]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[49]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[51]  Yoshihiko Mochizuki,et al.  A HOG-based hand gesture recognition system on a mobile device , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[52]  Nicolas D. Georganas,et al.  Real-Time Hand Gesture Detection and Recognition Using Bag-of-Features and Support Vector Machine Techniques , 2011, IEEE Transactions on Instrumentation and Measurement.

[53]  Chong Wang,et al.  Superpixel-Based Hand Gesture Recognition With Kinect Depth Camera , 2015, IEEE Transactions on Multimedia.

[54]  J. Sullivan,et al.  Action Recognition by Shape Matching to Key Frames , 2002 .

[55]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[56]  Antonio Bandera,et al.  Spatio-temporal feature-based keyframe detection from video shots using spectral clustering , 2013, Pattern Recognit. Lett..

[57]  Cláudio Rosito Jung,et al.  Dynamic Time Warping for Music Conducting Gestures Evaluation , 2015, IEEE Transactions on Multimedia.

[58]  Hong Liu,et al.  Gender Classification Using Pyramid Segmentation for Unconstrained Back-facing Video Sequences , 2015, ACM Multimedia.

[59]  Tae-Kyun Kim,et al.  Learning Motion Categories using both Semantic and Structural Information , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Ahmed M. Elgammal,et al.  Information Theoretic Key Frame Selection for Action Recognition , 2008, BMVC.

[61]  Hong Liu,et al.  Sequential Bag-of-Words model for human action classification , 2016, CAAI Trans. Intell. Technol..

[62]  Yueting Zhuang,et al.  Video Question Answering via Gradually Refined Attention over Appearance and Motion , 2017, ACM Multimedia.

[63]  William T. Freeman,et al.  Television control by hand gestures , 1994 .

[64]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.