Local polynomial space–time descriptors for action classification

In this paper we propose to tackle human action indexing by introducing a new local motion descriptor based on a model of the optical flow. We propose to apply a coding step to vector field before the modeling. We use two models: a spatial model and a temporal model. The spatial model is computed by projection of optical flow onto bivariate orthogonal polynomials. Then, the time evolution of spatial coefficients is modeled with a one-dimensional polynomial basis. To perform the action classification, we extend recent still image signatures using local descriptors to our proposal and combine them with linear SVM classifiers. The experiments are carried out on the well-known UCF11 dataset and on the more challenging Hollywood2 action classification dataset and show promising results.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  David Picard,et al.  Efficient image signatures and similarities using tensor products of local descriptors , 2013, Comput. Vis. Image Underst..

[3]  Ivan Laptev,et al.  Improving bag-of-features action recognition with non-local cues , 2010, BMVC.

[4]  Florent Perronnin,et al.  Modeling the spatial layout of images beyond spatial pyramids , 2012, Pattern Recognit. Lett..

[5]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  David Picard,et al.  Using spatial pyramids with compacted VLAT for image categorization , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[7]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[8]  Geoffrey E. Hinton,et al.  Learning Generative Texture Models with extended Fields-of-Experts , 2009, BMVC.

[9]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[10]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[11]  Dima Damen,et al.  Proceedings of the British Machine Vision Conference , 2014, BMVC 2014.

[12]  David Picard,et al.  Improving image similarity with vectors of locally aggregated tensors , 2011, 2011 18th IEEE International Conference on Image Processing.

[13]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[14]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Olivier Kihl,et al.  Human activities discrimination with motion approximation in polynomial bases , 2010, 2010 IEEE International Conference on Image Processing.

[17]  R. Nelson,et al.  Low level recognition of human motion (or how to get your man without finding his body parts) , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[18]  Eli Shechtman,et al.  Space-Time Behavior-Based Correlation-OR-How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them? , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[22]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[23]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[25]  Olivier Kihl,et al.  Multivariate orthogonal polynomials to extract singular points , 2008, 2008 15th IEEE International Conference on Image Processing.

[26]  Matthieu Cord,et al.  BOSSA: Extended bow formalism for image classification , 2011, 2011 18th IEEE International Conference on Image Processing.

[27]  T Koga,et al.  MOTION COMPENSATED INTER-FRAME CODING FOR VIDEO CONFERENCING , 1981 .

[28]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[29]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[30]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[31]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34]  Andrew Gilbert,et al.  Action Recognition Using Mined Hierarchical Compound Features , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Liang Wang,et al.  Learning and Matching of Dynamic Shape Manifolds for Human Action Recognition , 2007, IEEE Transactions on Image Processing.

[36]  Matti Pietikäinen,et al.  Human Activity Recognition Using a Dynamic Texture Based Method , 2008, BMVC.

[37]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[38]  Matti Pietikäinen,et al.  Texture Based Description of Movements for Activity Analysis , 2008, VISAPP.

[39]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[40]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[41]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Somayeh Danafar,et al.  Action Recognition for Surveillance Applications Using Optic Flow and SVM , 2007, ACCV.

[43]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[46]  Nazli Ikizler-Cinbis,et al.  Object, Scene and Actions: Combining Multiple Features for Human Action Recognition , 2010, ECCV.