Group Sparse Ensemble Learning for Visual Concept Detection

To exploit the hidden group structures of data and thus detect concepts in videos, this paper proposes a novel group sparse ensemble learning approach based on Automatic Group Sparse Coding (AutoGSC). We first adopt AutoGSC to learn both a common dictionary over different data groups and an individual group-specific dictionary for each data group which can help us to capture the discrimination information contained in different data groups. Next, we represent each data instance by using a sparse linear combination of both dictionaries. Finally, we propose an algorithm to use the reconstruction errors of data instances to calculate the ensemble gating function for ensemble construction and fusion. Experiments on the TRECVid 2008 benchmark show that the ensemble learning proposal achieves promising results and outperforms existing approaches.

[1]  Sheng Tang,et al.  Sparse Ensemble Learning for Concept Detection , 2012, IEEE Transactions on Multimedia.

[2]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[3]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[4]  Sheng Tang,et al.  TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS , 2007, TRECVID.

[5]  Sheng Tang,et al.  Ensemble Learning with LDA Topic Models for Visual Concept Detection , 2012 .

[6]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[7]  Samy Bengio,et al.  Group Sparse Coding , 2009, NIPS.

[8]  Jimeng Sun,et al.  Automatic Group Sparse Coding , 2011, AAAI.

[9]  Sheng Tang,et al.  Localized Multiple Kernel Learning for Realistic Human Action Recognition in Videos , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[11]  Haojie Li,et al.  Combining global and local matching of multiple features for precise item image retrieval , 2012, Multimedia Systems.

[12]  Paul Over,et al.  TRECVID 2008 - Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2010, TRECVID.

[13]  Dariu Gavrila,et al.  An Experimental Study on Pedestrian Classification , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Sanjeev Khudanpur,et al.  TRECVID 2005 Experiment at Johns Hopkins University: Using Hidden Markov Models for Video Retrieval , 2005, TRECVID.

[15]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Sheng Tang,et al.  Pornprobe: an LDA-SVM based pornography detection system , 2009, ACM Multimedia.

[17]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Gang Wang,et al.  On the sampling of web images for learning visual concept classifiers , 2010, CIVR '10.