Active learning for human action retrieval using query pool selection

Content-Based Video Retrieval (CBVR) is gaining considerable research interest, inspired by the need to manage the large amounts of video media accumulating on the Internet. In this paper, we verify that the current state-of-the-art retrieval algorithms for CBVR can be improved with active learning. Active learning algorithms query a user for relevance feedback on specific items within the search database, using the additional labeled datapoints to improve the accuracy of the user's original query. We propose a simple CBVR system with SVM relevance feedback, and integrate it with active learning using a simple query pool selection algorithm, based on two co-testing learners. Our experiments demonstrate that such a system performs significantly better with active learning than without, surpassing the state-of-the-art.

[1]  Christopher H. Bryant,et al.  Functional genomic hypothesis generation and experimentation by a robot scientist , 2004, Nature.

[2]  Ling Shao,et al.  Action retrieval with relevance feedback on YouTube videos , 2011, ICIMCS '11.

[3]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[4]  WangMeng,et al.  Active learning in multimedia annotation and retrieval , 2011 .

[5]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[6]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[7]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[8]  Jingrui He,et al.  Mean version space: a new active learning method for content-based image retrieval , 2004, MIR '04.

[9]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[11]  Ling Shao,et al.  Spatio-temporal shape contexts for human action retrieval , 2009, IMCE '09.

[12]  Edward Y. Chang,et al.  Multimodal concept-dependent active learning for image retrieval , 2004, MULTIMEDIA '04.

[13]  M. Brady,et al.  Scale Saliency: a novel approach to salient feature and scale selection , 2003 .

[14]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[15]  Edward Y. Chang,et al.  Active learning in very large databases , 2006, Multimedia Tools and Applications.

[16]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[17]  Craig A. Knoblock,et al.  Selective Sampling with Redundant Views , 2000, AAAI/IAAI.

[18]  Meng Wang,et al.  Active learning in multimedia annotation and retrieval: A survey , 2011, TIST.

[19]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[20]  I. Patras,et al.  Spatiotemporal salient points for visual recognition of human actions , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  Qi Tian,et al.  Incorporate support vector machines to content-based image retrieval with relevance feedback , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[22]  David Elliott,et al.  In the Wild , 2010 .

[23]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Ling Shao,et al.  Relevance feedback for real-world human action retrieval , 2012, Pattern Recognit. Lett..

[25]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[26]  Remi Depommier,et al.  Content-based browsing of video sequences , 1994, MULTIMEDIA '94.

[27]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[28]  Ling Shao,et al.  Retrieving Human Actions Using Spatio-Temporal Features and Relevance Feedback , 2010 .

[29]  Liang-Tien Chia,et al.  Learning instance-to-class distance for human action recognition , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[30]  Zhi-Hua Zhou,et al.  Exploiting Unlabeled Data in Content-Based Image Retrieval , 2004, ECML.

[31]  Ling Shao,et al.  Spatio-temporal steerable pyramid for human action recognition , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[32]  Ling Shao,et al.  Embedding Motion and Structure Features for Action Recognition , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  D. Angluin Queries and Concept Learning , 1988 .

[35]  Xuelong Li,et al.  Multitraining Support Vector Machine for Image Retrieval , 2006, IEEE Transactions on Image Processing.

[36]  Ling Shao,et al.  Feature detector and descriptor evaluation in human action recognition , 2010, CIVR '10.

[37]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[38]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[39]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..