Action retrieval with relevance feedback on YouTube videos

Content-based retrieval systems are becoming increasingly relevant for managing large multimedia databases, such as those found on the Internet. In this paper, we investigate applying content-based retrieval with relevance feedback to the popular YouTube human action dataset[8], using a variety of methods to extract and compare features, in order to determine the most accurate techniques in this setting. Among other techniques, we explore soft-assignment code-book clustering, feature pruning, motion and static features, Adaboost and ABRS-SVM for relevance feedback. We evaluate the performance of several different systems to find the best combination of techniques for human action retrieval. We demonstrate that existing relevance feedback methods do not work well for YouTube media, and that a naive algorithm consistently outperforms these.

[1]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[3]  Meng Wang,et al.  Active learning in multimedia annotation and retrieval: A survey , 2011, TIST.

[4]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[5]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[6]  Ling Shao,et al.  Retrieving Human Actions Using Spatio-Temporal Features and Relevance Feedback , 2010 .

[7]  Ling Shao,et al.  Feature detector and descriptor evaluation in human action recognition , 2010, CIVR '10.

[8]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Ling Shao,et al.  Relevance feedback for real-world human action retrieval , 2012, Pattern Recognit. Lett..

[10]  Remi Depommier,et al.  Content-based browsing of video sequences , 1994, MULTIMEDIA '94.

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  Ling Shao,et al.  Human action segmentation and recognition via motion and shape analysis , 2012, Pattern Recognit. Lett..

[13]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[15]  Rong Yan,et al.  Negative pseudo-relevance feedback in content-based video retrieval , 2003, MULTIMEDIA '03.