Learning 3D Compact Binary Descriptor for Human Action Recognition in Video

Hand-crafted descriptors are widely used for human action recognition in video at present. However, they are not optimized and may lack discriminative information. To compensate this drawback, this paper presents a learning-based 3D compact binary descriptor (3D-CBD) for human action video representation. The proposed descriptor is a 3D extension of the compact binary face descriptor (CBFD). Given a video sequence, we first extract pixel difference vectors (PDVs) in local volumes and then learn a feature mapping to project these PDVs into low-dimensional binary vectors. Finally, we cluster and pool these binary codes into histogram feature as the representation of the video sequence. Experimental results on two action datasets (KTH and WEIZMANN) demonstrate the effectiveness of the proposed descriptor.

[1]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[2]  Shaogang Gong,et al.  Recognising action as clouds of space-time interest points , 2009, CVPR.

[3]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[5]  Roberto Cipolla,et al.  Extracting Spatiotemporal Interest Points using Global Information , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Mubarak Shah,et al.  Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Alberto Del Bimbo,et al.  Effective Codebooks for human action categorization , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[8]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[9]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[10]  Jiwen Lu,et al.  Learning Compact Binary Face Descriptor for Face Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[12]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[13]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[14]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[15]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).