ADSC Submission at THUMOS Challenge 2015

This notebook paper describes our approaches for the action recognition and temporal localization tasks of the THUMOS Challenge 2015. For the action recognition task, we use the subsequence-score distribution (SSD) framework. We use the Improved Fisher Vectors (IFVs) encoding of the Improved Dense Trajectories (IDTs) to capture motion, as well as a VGG-16 deep net model to extract 4096 dimension feature vector to capture the context information. A linear SVM is trained for classification of 101 categories' action video clips. For the temporal localization task, we use the IFV encoding at 9 different temporal scales, and apply the above SVM to obtain a pyramid score descriptor. The score features are used for generating action labels at frame level, and by proper post processing we are able to detect the 20 class actions in given videos.