Cascade of classifiers based on binary, non-binary and deep convolutional network descriptors for video concept detection

In this paper we propose a cascade architecture that can be used to train and combine different visual descriptors (local binary, local non-binary and Deep Convolutional Neural Network-based) for video concept detection. The proposed architecture is computationally more efficient than typical state-of-the-art video concept detection systems, without affecting the detection accuracy. In addition, this work presents a detailed study on combining descriptors based on Deep Convolutional Neural Networks with other popular local descriptors, both within a cascade and when using different late-fusion schemes. We evaluate our methods on the extensive video dataset of the 2013 TRECVID Semantic Indexing Task.

[1]  Yiannis Kompatsiaris,et al.  ITI-CERTH participation to TRECVID 2015 , 2015, TRECVID.

[2]  Florian Metze,et al.  Informedia @ TRECVID 2011 , 2011 .

[3]  Georges Quénot,et al.  Re-ranking by local re-scoring for video indexing and retrieval , 2011, CIKM '11.

[4]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[5]  Georges Quénot,et al.  Evaluations of multi-learner approaches for concept indexing in video documents , 2010, RIAO.

[6]  Takeshi Tokuyama,et al.  Cascade of Multi-level Multi-instance Classifiers for Image Annotation , 2011, KDIR.

[7]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[8]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Chunhua Shen,et al.  LACBoost and FisherBoost: Optimally Building Cascade Classifiers , 2010, ECCV.

[12]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[13]  Georges Quénot,et al.  LIG at TRECVid 2014: Semantic Indexing , 2014, TRECVID.

[14]  François Fleuret,et al.  Joint Cascade Optimization Using A Product Of Boosted Classifiers , 2010, NIPS.

[15]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Cordelia Schmid,et al.  Combining efficient object localization and image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Jonathan Brandt,et al.  Robust object detection via soft cascade , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Patrice Y. Simard,et al.  Combining Multiple Classifiers for Faster Optical Character Recognition , 2006, Document Analysis Systems.

[19]  Cees G. M. Snoek,et al.  The MediaMill at TRECVID 2013: : Searching concepts, Objects, Instances and events in video , 2013, TRECVID.

[20]  A. Smeaton,et al.  TRECVID 2013 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics | NIST , 2011 .

[21]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[22]  Ioannis Patras,et al.  A Study on the Use of a Binary Local Descriptor and Color Extensions of Local Descriptors for Video Concept Detection , 2015, MMM.

[23]  Gabriela Csurka,et al.  Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations , 2010, VISIGRAPP.

[24]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[25]  Georges Quénot,et al.  Hierarchical Late Fusion for Concept Detection in Videos , 2014, Fusion in Computer Vision.

[26]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[27]  Jianguo Li,et al.  Learning SURF Cascade for Fast and Accurate Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Yiannis Kompatsiaris,et al.  Improving Interactive Video Retrieval by Exploiting Automatically-Extracted Video Structural Semantics , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.