Optical Flow Co-occurrence Matrices: A novel spatiotemporal feature descriptor

Suitable feature representation is essential for performing video analysis and understanding in applications within the smart surveillance domain. In this paper, we propose a novel spatiotemporal feature descriptor based on co-occurrence matrices computed from the optical flow magnitude and orientation. Our method, called Optical Flow Co-occurrence Matrices (OFCM), extracts a robust set of measures known as Haralick features to describe the flow patterns by measuring meaningful properties such as contrast, entropy and homogeneity of co-occurrence matrices to capture local space-time characteristics of the motion through the neighboring optical flow magnitude and orientation. We evaluate the proposed method on the action recognition problem by applying a visual recognition pipeline involving bag of local spatiotemporal features and SVM classification. The experimental results, carried on three well-known datasets (KTH, UCF Sports and HMDB51), demonstrate that OFCM outperforms the results achieved by several widely employed spatiotemporal feature descriptors such as HOF, HOG3D and MBH, indicating its suitability to be used as video representation.

[1]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[2]  Takeshi Mita,et al.  Discriminative Feature Co-Occurrence Selection for Object Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[4]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[6]  Wageeh Boles,et al.  A suspicious behaviour detection using a context space model for smart surveillance systems , 2012, Comput. Vis. Image Underst..

[7]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[8]  Takumi Kobayashi,et al.  Motion recognition using local auto-correlation of space-time gradients , 2012, Pattern Recognit. Lett..

[9]  Mario A. Nascimento,et al.  A compact and efficient image retrieval approach based on border/interior pixel classification , 2002, CIKM '02.

[10]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[11]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Marcelo Bernardes Vieira,et al.  A tensor motion descriptor based on histograms of gradients and optical flow , 2014, Pattern Recognit. Lett..

[14]  Sridha Sridharan,et al.  Textures of optical flow for real-time anomaly detection in crowds , 2011, 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[15]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[16]  Roberto Cipolla,et al.  Co-occurrence flow for pedestrian detection , 2011, 2011 18th IEEE International Conference on Image Processing.

[17]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[19]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[20]  Satoshi Ito,et al.  Co-occurrence Histograms of Oriented Gradients for Pedestrian Detection , 2009, PSIVT.

[21]  Scott Krig,et al.  Interest Point Detector and Feature Descriptor Survey , 2014 .

[22]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[23]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[24]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[25]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[26]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[27]  Ivan Laptev,et al.  Local Descriptors for Spatio-temporal Recognition , 2004, SCVMA.

[28]  Takumi Kobayashi,et al.  Image Feature Extraction Using Gradient Local Auto-Correlations , 2008, ECCV.

[29]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[30]  Chenyang Zhang,et al.  RGB-D Camera-based Daily Living Activity Recognition , 2022 .

[31]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Ling Shao,et al.  Spatio-Temporal Laplacian Pyramid Coding for Action Recognition , 2014, IEEE Transactions on Cybernetics.

[33]  Feng Shi,et al.  Gradient Boundary Histograms for Action Recognition , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[34]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[37]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..