Ordering of Visual Descriptors in a Classifier Cascade Towards Improved Video Concept Detection

Concept detection for semantic annotation of video fragments e.g. keyframes is a popular and challenging problem. A variety of visual features is typically extracted and combined in order to learn the relation between feature-based keyframe representations and semantic concepts. In recent years the available pool of features has increased rapidly, and features based on deep convolutional neural networks in combination with other visual descriptors have significantly contributed to improved concept detection accuracy. This work proposes an algorithm that dynamically selects, orders and combines many base classifiers, trained independently with different feature-based keyframe representations, in a cascade architecture for video concept detection. The proposed cascade is more accurate and computationally more efficient, in terms of classifier evaluations, than state-of-the-art classifier combination approaches.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Ioannis Patras,et al.  A Study on the Use of a Binary Local Descriptor and Color Extensions of Local Descriptors for Video Concept Detection , 2015, MMM.

[5]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[6]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[7]  Yiannis Kompatsiaris,et al.  Video Tomographs and a Base Detector Selection Strategy for Improving Large-Scale Video Concept Detection , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Ioannis Patras,et al.  Cascade of classifiers based on binary, non-binary and deep convolutional network descriptors for video concept detection , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[9]  Georges Quénot,et al.  Hierarchical Late Fusion for Concept Detection in Videos , 2014, Fusion in Computer Vision.

[10]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[11]  A. Smeaton,et al.  TRECVID 2013 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics | NIST , 2011 .

[12]  Takeshi Tokuyama,et al.  Cascade of Multi-level Multi-instance Classifiers for Image Annotation , 2011, KDIR.

[13]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[14]  Georges Quénot,et al.  Re-ranking by local re-scoring for video indexing and retrieval , 2011, CIKM '11.

[15]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Patrice Y. Simard,et al.  Combining Multiple Classifiers for Faster Optical Character Recognition , 2006, Document Analysis Systems.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Xiaojun Chang,et al.  CMU-informedia @ TRECViD 2014 semantic indexing , 2014 .

[19]  Wen-Chang Cheng,et al.  A cascade classifier using Adaboost algorithm and support vector machine for pedestrian detection , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.