Efficient Framework for Action Recognition Using Reduced Fisher Vector Encoding

This paper presents a novel and efficient approach to improve performance of recognizing human actions from video by using an unorthodox combination of stage-level approaches. Feature descriptors obtained from dense trajectory i.e. HOG, HOF and MBH are known to be successful in representing videos. In this work, Fisher Vector Encoding with reduced dimensions are separately obtained for each of these descriptors and all of them are concatenated to form one super vector representing each video. To limit the dimension of this super vector we only include first order statistics, computed by the Gaussian Mixture Model, in the individual Fisher Vectors. Finally, we use elements of this super vector, as inputs to be fed to the Deep Belief Network (DBN) classifier. The performance of this setup is evaluated on KTH and Weizmann datasets. Experimental results show a significant improvement on these datasets. An accuracy of 98.92 and 100 % has been obtained on KTH and Weizmann dataset respectively.

[1]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[2]  Ayoub Al-Hamadi,et al.  An Action Recognition Scheme Using Fuzzy Log-Polar Histogram and Temporal Self-Similarity , 2011, EURASIP J. Adv. Signal Process..

[3]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yang Wang,et al.  Human Action Recognition by Semilatent Topic Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[6]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[7]  Christian Wolf,et al.  Sequential Deep Learning for Human Action Recognition , 2011, HBU.

[8]  Hassan Foroosh,et al.  Action recognition using rank-1 approximation of Joint Self-Similarity Volume , 2011, 2011 International Conference on Computer Vision.

[9]  Anni Cai,et al.  Comparing Evaluation Protocols on the KTH Dataset , 2010, HBU.

[10]  Edmond Boyer,et al.  Action recognition using exemplar-based embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  Shaogang Gong,et al.  Recognising action as clouds of space-time interest points , 2009, CVPR.

[13]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[16]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Xinghua Sun,et al.  Action recognition via local descriptors and holistic features , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Larry S. Davis,et al.  Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[20]  Pinar Duygulu Sahin,et al.  Histogram of oriented rectangles: A new pose descriptor for human action recognition , 2009, Image Vis. Comput..

[21]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[22]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Ramakant Nevatia,et al.  Large-scale web video event classification by use of Fisher Vectors , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).