Synthesis of spatio-temporal descriptors for dynamic hand gesture recognition using genetic programming

Automatic gesture recognition has received much attention due to its potential in various applications. In this paper, we successfully apply an evolutionary method-genetic programming (GP) to synthesize machine learned spatio-temporal descriptors for automatic gesture recognition instead of using hand-crafted descriptors. In our architecture, a set of primitive low-level 3D operators are first randomly assembled as tree-based combinations, which are further evolved generation-by-generation through the GP system, and finally a well performed combination will be selected as the best descriptor for high-level gesture recognition. To the best of our knowledge, this is the first report of using GP to evolve spatio-temporal descriptors for gesture recognition. We address this as a domain-independent optimization issue and evaluate our proposed method, respectively, on two public dynamic gesture datasets: Cambridge hand gesture dataset and Northwestern University hand gesture dataset to demonstrate its generalizability. The experimental results manifest that our GP-evolved descriptors can achieve better recognition accuracies than state-of-the-art hand-crafted techniques.

[1]  Ling Shao,et al.  Genetic Programming-Evolved Spatio-Temporal Descriptor for Human Action Recognition , 2012, BMVC.

[2]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[3]  R. Poli Genetic programming for image analysis , 1996 .

[4]  Gang Hua,et al.  Dynamic hand gesture recognition: An exemplar-based approach from motion divergence fields , 2012, Image Vis. Comput..

[5]  James W. Davis Hierarchical motion history images for recognizing human motion , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[6]  Lars Bretzner,et al.  Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[7]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[8]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[9]  Daniel Howard,et al.  Target detection in SAR imagery by genetic programming , 1999 .

[10]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Edward A. Fox,et al.  A genetic programming framework for content-based image retrieval , 2009, Pattern Recognit..

[12]  Leonardo Trujillo,et al.  Synthesis of interest point detectors through genetic programming , 2006, GECCO.

[13]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[14]  Roberto Cipolla,et al.  Real-time Interpretation of Hand Motions using a Sparse Bayesian Classifier on Motion Gradient Orientation Images , 2005, BMVC.

[15]  Dacheng Tao,et al.  Biologically Inspired Feature Manifold for Scene Classification , 2010, IEEE Transactions on Image Processing.

[16]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  P. Fihl,et al.  View-invariant gesture recognition using 3D optical flow and harmonic motion context , 2010, Comput. Vis. Image Underst..

[18]  Julie Wilson,et al.  Novel feature selection method for genetic programming using metabolomic 1H NMR data , 2006 .

[19]  Alex Pentland,et al.  Invariant features for 3-D gesture recognition , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[20]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[21]  Vidroha Debroy,et al.  Genetic Programming , 1998, Lecture Notes in Computer Science.

[22]  Tetsuya Takiguchi,et al.  3D human posture estimation using the HOG features from monocular image , 2008, 2008 19th International Conference on Pattern Recognition.

[23]  I. Daubechies,et al.  Biorthogonal bases of compactly supported wavelets , 1992 .

[24]  Fei-FeiLi,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008 .

[25]  Alex A. Freitas,et al.  Evolutionary Computation , 2002 .

[26]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[27]  Ben J. A. Kröse,et al.  A probabilistic model for appearance-based robot localization , 2001, Image and Vision Computing.

[28]  Matthew Turk,et al.  View-based interpretation of real-time optical flow for gesture recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.