Improving video event retrieval by user feedback

In content based video retrieval videos are often indexed with semantic labels (concepts) using pre-trained classifiers. These pre-trained classifiers (concept detectors), are not perfect, and thus the labels are noisy. Additionally, the amount of pre-trained classifiers is limited. Often automatic methods cannot represent the query adequately in terms of the concepts available. This problem is also apparent in the retrieval of events, such as bike trick or birthday party. Our solution is to obtain user feedback. This user feedback can be provided on two levels: concept level and video level. We introduce the method Adaptive Relevance Feedback (ARF) on video level feedback. ARF is based on the classical Rocchio relevance feedback method from Information Retrieval. Furthermore, we explore methods on concept level feedback, such as the re-weighting and Query Point Modification (QPM) methods as well as a method that changes the semantic space the concepts are represented in. Methods on both concept level and video level are evaluated on the international benchmark TRECVID Multimedia Event Detection (MED) and compared to state of the art methods. Results show that relevance feedback on both concept and video level improves performance compared to using no relevance feedback; relevance feedback on video level obtains higher performance compared to relevance feedback on concept level; our proposed ARF method on video level outperforms a state of the art k-NN method, all methods on concept level and even manually selected concepts.

[1]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[2]  Kylie Jarrett,et al.  YouTube: Online video and participatory culture , 2010 .

[3]  Deyu Meng,et al.  Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.

[4]  Tetsuya Sakai,et al.  Flexible pseudo-relevance feedback via selective sampling , 2005, TALIP.

[5]  Shruti Patil A Comprehensive Review of Recent Relevance Feedback Techniques in CBIR , 2012 .

[6]  Xiaojun Chang,et al.  Incremental Multimodal Query Construction for Video Search , 2015, ICMR.

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  M. de Boer,et al.  Applying Semantic Reasoning in Image Retrieval , 2015, Big Data 2015.

[10]  KraaijWessel,et al.  Knowledge based query expansion in complex multimedia event detection , 2016 .

[11]  Dong Liu,et al.  EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video , 2015, ACM Multimedia.

[12]  Yimin Wu,et al.  Interactive pattern analysis for relevance feedback in multimedia information retrieval , 2004, Multimedia Systems.

[13]  N. Boujemaa,et al.  Relevance Feedback for Image Retrieval : a Short Survey , 2004 .

[14]  Fabio Roli,et al.  Instance-Based Relevance Feedback for Image Retrieval , 2004, NIPS.

[15]  Deyu Meng,et al.  Bridging the Ultimate Semantic Gap: A Semantic Search Engine for Internet Videos , 2015, ICMR.

[16]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[17]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[18]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[19]  Jeff Z. Pan,et al.  Combining Visual and Textual Systems within the Context of User Feedback , 2013, MMM.

[20]  Hermann Ney,et al.  Learning weighted distances for relevance feedback in image retrieval , 2008, 2008 19th International Conference on Pattern Recognition.

[21]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[22]  Wessel Kraaij,et al.  VIREO-TNO @ TRECVID 2015: Multimedia Event Detection , 2015 .

[23]  L. M. Saha,et al.  Dynamic Lyapunov Indicator (DLI): A Perfect Indicator for Evolutionary System , 2016 .

[24]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[25]  Shih-Fu Chang,et al.  Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[27]  James Allan,et al.  Zero-shot video retrieval using content and concepts , 2013, CIKM.

[28]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[29]  Dennis Koelma,et al.  Qualcomm Research and University of Amsterdam at TRECVID 2015: Recognizing Concepts, Objects, and Events in Video , 2015, TRECVID.

[30]  Teruko Mitamura,et al.  Zero-Example Event Search using MultiModal Pseudo Relevance Feedback , 2014, ICMR.

[31]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[32]  Xuelong Li,et al.  Which Components are Important for Interactive Image Searching? , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[35]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[36]  ChengXiang Zhai,et al.  Positional relevance model for pseudo-relevance feedback , 2010, SIGIR.

[37]  Nicu Sebe,et al.  Event-based media processing and analysis: A survey of the literature , 2016, Image Vis. Comput..

[38]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[39]  Ahmed M. Elgammal,et al.  Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos , 2015, AAAI.

[40]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[41]  Nicu Sebe,et al.  Integrating Relevance Feedback in Boosting for Content-Based Image Retrieval , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[42]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[43]  Chong-Wah Ngo,et al.  Multimedia Event Detection , 2015 .

[44]  Nicu Sebe,et al.  Fisher Kernel Temporal Variation-based Relevance Feedback for video retrieval , 2016, Comput. Vis. Image Underst..

[45]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Mubarak Shah,et al.  High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.

[47]  Manesh Kokare,et al.  Relevance Feedback in Content Based Image Retrieval: A Review , 2011 .

[48]  Chih-Fong Tsai,et al.  Factors affecting rocchio‐based pseudorelevance feedback in image retrieval , 2015, J. Assoc. Inf. Sci. Technol..

[49]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[50]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[51]  Jia Deng,et al.  A large-scale hierarchical image database , 2009, CVPR 2009.

[52]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[53]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[54]  Dong Liu,et al.  BBN VISER TRECVID 2011 Multimedia Event Detection System , 2011, TRECVID.

[55]  Lior Wolf,et al.  In Defense of Word Embedding for Generic Text Representation , 2015, NLDB.

[56]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[57]  Cees Snoek,et al.  VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events , 2014, ACM Multimedia.

[58]  Xiangyang Wang,et al.  A new SVM-based relevance feedback image retrieval using probabilistic feature and weighted kernel function , 2016, J. Vis. Commun. Image Represent..

[59]  LalmasMounia,et al.  A survey on the use of relevance feedback for information access systems , 2003 .

[60]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[61]  Joshua Green,et al.  YouTube: Online Video and Participatory Culture , 2009 .

[62]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[63]  Pedro Henrique Bugatti,et al.  A Novel Framework for Content-Based Image Retrieval Through Relevance Feedback Optimization , 2015, CIARP.

[64]  Alan Hanjalic,et al.  Supervised reranking for web image search , 2010, ACM Multimedia.

[65]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[66]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[67]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[68]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.