Extreme Learning Machine for Large-Scale Action Recognition

In this paper, we describe the method we applied for the action recognition task on the THUMOS 2014 challenge dataset. We study human action recognition in RGB videos through low-level features by focusing on improved trajectory features that are densely extracted from the spatio-temporal volume. We represent each video with Fisher vector encoding and additional mid-level feautures. Finally, we use Extreme Learning Machines for classification. We achieve 62.27% mean average precision on the validation set.

[1]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[2]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[3]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[6]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[7]  Richard P. Wildes,et al.  Efficient action spotting based on a spacetime oriented structure representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Q. M. Jonathan Wu,et al.  Human action recognition using extreme learning machine based on visual vocabularies , 2010, Neurocomputing.

[9]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[10]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[11]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[14]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Heng Wang LEAR-INRIA submission for the THUMOS workshop , 2013 .

[16]  Nazli Ikizler-Cinbis,et al.  Action Recognition and Localization by Hierarchical Space-Time Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Alexandros Iosifidis,et al.  Multi-view Regularized Extreme Learning Machine for Human Action Recognition , 2014, SETN.