Action Recognition from Optical Flow Visualizations

Optical flow is an important computer vision technique used for motion estimation, object tracking and activity recognition. In this paper, we study the effectiveness of the optical flow feature in recognizing simple actions by using only their RGB visualizations as input to a deep neural network. Feeding only the optical flow visualizations, instead of the raw multimedia content, ensures that only a single motion feature is used as a classification criterion. Here, we deal with human action recognition as a multi-class classification problem. In order to categorize an action, we train an AlexNet-like Convolutional Neural Network (CNN) on Farneback optical flow visualization features of the action videos. We have chosen the KTH data set, which contains six types of action videos, namely walking, running, boxing, jogging, hand-clapping and hand-waving. The accuracy obtained on the test set is 84.72%, and it is naturally less than the state of the art since only a single motion feature is used for classification, but it is high enough to show the effectiveness of optical flow visualization as a good distinguishing criterion for action recognition. The AlexNet-like CNN was trained in Caffe on two NVIDIA Quadro K4200 GPU cards, while the Farneback optical flow features were calculated using OpenCV library.

[1]  Mario Cannataro,et al.  Protein-to-protein interactions: Technologies, databases, and algorithms , 2010, CSUR.

[2]  Ioannis A. Kakadiaris,et al.  A Review of Human Activity Recognition Methods , 2015, Front. Robot. AI.

[3]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[4]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[5]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[6]  Woody Sherman,et al.  Improved Docking of Polypeptides with Glide , 2013, J. Chem. Inf. Model..

[7]  Jake K. Aggarwal,et al.  Human motion analysis: a review , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[8]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[9]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[10]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[11]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  James W. Davis,et al.  Action Recognition Using Temporal Templates , 1997 .

[13]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Z. Zivkovic Improved adaptive Gaussian mixture model for background subtraction , 2004, ICPR 2004.

[15]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16]  Cassandra Mariette Carley Human Activity Analysis , 2018 .

[17]  Hafiz Imtiaz,et al.  An optical flow based approach for action recognition , 2011, 14th International Conference on Computer and Information Technology (ICCIT 2011).

[18]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[25]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.