A tensor motion descriptor based on histograms of gradients and optical flow

This paper presents a new tensor motion descriptor only using optical flow and HOG3D information: no interest points are extracted and it is not based on a visual dictionary. We propose a new aggregation technique based on tensors. This is a double aggregation of tensor descriptors. The first one represents motion by using polynomial coefficients which approximates the optical flow. The other represents the accumulated data of all histograms of gradients of the video. The descriptor is evaluated by a classification of KTH, UCF11 and Hollywood2 datasets, using a SVM classifier. Our method reaches 93.2% of recognition rate with KTH, comparable to the best local approaches. For the UCF11 and Hollywood2 datasets, our recognition achieves fairly competitive results compared to local and learning based approaches.

[1]  C. Westin A Tensor Framework for Multidimensional Signal Processing , 1994 .

[2]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[3]  Martin Druon Modélisation du mouvement par polynômes orthogonaux : application à l'étude d'écoulements fluides , 2009 .

[4]  C. Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[6]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[7]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[8]  Carl Henrik Ek,et al.  International Conference on Pattern Recognition , 2014 .

[9]  Christine Fernandez-Maloigne,et al.  Vectorial Computation of the Optical Flow in Color Image Sequences , 2005, Color Imaging Conference.

[10]  Olivier Kihl,et al.  Human activities discrimination with motion approximation in polynomial bases , 2010, 2010 IEEE International Conference on Image Processing.

[11]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[12]  Ling Shao,et al.  A local descriptor based on Laplacian pyramid coding for action recognition , 2013, Pattern Recognit. Lett..

[13]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[14]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[15]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[17]  Barbara Caputo,et al.  Local velocity-adapted motion events for spatio-temporal recognition , 2007, Comput. Vis. Image Underst..

[18]  Frédéric Precioso,et al.  A Tensor Based on Optical Flow for Global Description of Motion in Videos , 2012, 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images.

[19]  Björn Johansson,et al.  A Theoretical Comparison of Different Orientation Tensors , 2002 .

[20]  Mubarak Shah,et al.  Classifying web videos using a global video descriptor , 2013, Machine Vision and Applications.

[21]  Marcelo Bernardes Vieira,et al.  Combining gradient histograms using orientation tensors for human action recognition , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[22]  Takumi Kobayashi,et al.  Motion recognition using local auto-correlation of space-time gradients , 2012, Pattern Recognit. Lett..

[23]  Derek Hoiem,et al.  Action Recognition , 2014, Computer Vision, A Reference Guide.

[24]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Mubarak Shah,et al.  Recognizing human actions , 2005, VSSN@MM.