Action Recognition with Shot Boundary Detection and Decoded iDT Features

We report our results on the THUMOS Challenge 2014 Ac- tion Recognition Task. Given an untrimmed video as input, our method recognizes its major action category in four main steps: (1) detect shot boundaries within the video; (2) decode pre-computed indices of iDTFs within each shot; (3) encode the recovered iDTFs into shot-wise fisher vectors and compute the shot-to-category classification scores; (4) sum- marize shot-wise classification scores into a video-wise classification score. We report results on four different strategies that are used for summa- rizing the video-wise classification scores.

[1]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2]  Paul Over,et al.  Video shot boundary detection: Seven years of TRECVid activity , 2010, Comput. Vis. Image Underst..

[3]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[4]  Vasileios Mezaris,et al.  Fast shot segmentation combining global and local visual descriptors , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.