Feature detector and descriptor evaluation in human action recognition

In this paper, we evaluate and compare different feature detection and feature description methods for part-based approaches in human action recognition. Different methods have been proposed in the literature for both feature detection of space-time interest points and description of local video patches. It is however unclear which method performs better in the field of human action recognition. We compare, in the feature detection section, Dollar's method [18], Laptev's method [22], a bank of 3D-Gabor filters [6] and a method based on Space-Time Differences of Gaussians. We also compare and evaluate different descriptors such as Gradient [18], HOG-HOF [22], 3D SIFT [24] and an enhanced version of LBP-TOP [15]. We show the combination of Dollar's detection method and the improved LBP-TOP descriptor to be computationally efficient and to reach the best recognition accuracy on the KTH database.

[1]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  C. Schmid,et al.  Description of Interest Regions with Center-Symmetric Local Binary Patterns , 2006, ICVGIP.

[3]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[4]  Marko Heikkilä,et al.  Description of interest regions with local binary patterns , 2009, Pattern Recognit..

[5]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[8]  Yuxiao Hu,et al.  Searching Human Behaviors using Spatial-Temporalwords , 2007, 2007 IEEE International Conference on Image Processing.

[9]  Mubarak Shah,et al.  Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[12]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[16]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[17]  Matti Pietikäinen,et al.  Face Recognition with Local Binary Patterns , 2004, ECCV.

[18]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[19]  Roberto Cipolla,et al.  Extracting Spatiotemporal Interest Points using Global Information , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Ling Shao,et al.  Human Action Recognition Using LBP-TOP as Sparse Spatio-Temporal Feature Descriptor , 2009, CAIP.

[21]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[24]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Ammad Ali,et al.  Face Recognition with Local Binary Patterns , 2012 .

[26]  Ghassan Hamarneh,et al.  N-Sift: N-Dimensional Scale Invariant Feature Transform for Matching Medical Images , 2007, ISBI.

[27]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[28]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[29]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[30]  Ivan Laptev,et al.  Local Descriptors for Spatio-temporal Recognition , 2004, SCVMA.

[31]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.