Memory recall based video search: Finding videos you have seen before based on your memory

We often remember images and videos that we have seen or recorded before but cannot quite recall the exact venues or details of the contents. We typically have vague memories of the contents, which can often be expressed as a textual description and/or rough visual descriptions of the scenes. Using these vague memories, we then want to search for the corresponding videos of interest. We call this “Memory Recall based Video Search” (MRVS). To tackle this problem, we propose a video search system that permits a user to input his/her vague and incomplete query as a combination of text query, a sequence of visual queries, and/or concept queries. Here, a visual query is often in the form of a visual sketch depicting the outline of scenes within the desired video, while each corresponding concept query depicts a list of visual concepts that appears in that scene. As the query specified by users is generally approximate or incomplete, we need to develop techniques to handle this inexact and incomplete specification by also leveraging on user feedback to refine the specification. We utilize several innovative approaches to enhance the automatic search. First, we employ a visual query suggestion model to automatically suggest potential visual features to users as better queries. Second, we utilize a color similarity matrix to help compensate for inexact color specification in visual queries. Third, we leverage on the ordering of visual queries and/or concept queries to rerank the results by using a greedy algorithm. Moreover, as the query is inexact and there is likely to be only one or few possible answers, we incorporate an interactive feedback loop to permit the users to label related samples which are visually similar or semantically close to the relevant sample. Based on the labeled samples, we then propose optimization algorithms to update visual queries and concept weights to refine the search results. We conduct experiments on two large-scale video datasets: TRECVID 2010 and YouTube. The experimental results demonstrate that our proposed system is effective for MRVS tasks.

[1]  Chong-Wah Ngo,et al.  Circular Reranking for Visual Search , 2013, IEEE Transactions on Image Processing.

[2]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[3]  Meng Wang,et al.  Utilizing Related Samples to Enhance Interactive Concept-Based Video Search , 2011, IEEE Transactions on Multimedia.

[4]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[5]  Shih-Fu Chang,et al.  CuZero: embracing the frontier of interactive visual search for informed users , 2008, MIR '08.

[6]  Zhouyu Fu,et al.  Semantic-Based Surveillance Video Retrieval , 2007, IEEE Transactions on Image Processing.

[7]  Meng Wang,et al.  Learning concept bundles for video search with complex queries , 2011, MM '11.

[8]  Li Li,et al.  A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9]  HongJiang Zhang,et al.  Motion texture: a new motion based video representation , 2002, Object recognition supported by user interaction for service robots.

[10]  M. Hestenes Multiplier and gradient methods , 1969 .

[11]  Shuicheng Yan,et al.  TRECVID 2010 Known-item Search by NUS , 2010, TRECVID.

[12]  Tao Mei,et al.  CrowdReranking: exploring multiple search engines for visual search reranking , 2009, SIGIR.

[13]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[14]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[15]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[16]  Meng Wang,et al.  Multimedia answering: enriching text QA with media information , 2011, SIGIR.

[17]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[18]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[19]  Dong Wang,et al.  The importance of query-concept-mapping for automatic video retrieval , 2007, ACM Multimedia.

[20]  Shih-Fu Chang,et al.  Automatic discovery of query-class-dependent models for multimodal search , 2005, MULTIMEDIA '05.

[21]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[22]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[23]  Andrew Zisserman,et al.  Person Spotting: Video Shot Retrieval for Face Sets , 2005, CIVR.

[24]  Tao Mei,et al.  Multigraph-Based Query-Independent Learning for Video Search , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Jun Yang,et al.  Exploring temporal consistency for video analysis and retrieval , 2006, MIR '06.

[26]  Newton Lee,et al.  ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP) , 2007, CIE.

[27]  C. V. Ramamoorthy,et al.  Visual Ontology Construction and Concept Detection for Multimedia Indexing and Retrieval , 2010 .

[28]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[29]  Rong Yan,et al.  A review of text and image retrieval approaches for broadcast news video , 2007, Information Retrieval.

[30]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[31]  Alan F. Smeaton,et al.  Video retrieval using dialogue, keyframe similarity and video objects , 2005, IEEE International Conference on Image Processing 2005.

[32]  Yongwei Zhu,et al.  TRECVID 2010 Known-item Search (KIS) Task by I2R , 2010, TRECVID.

[33]  Shih-Fu Chang,et al.  Reranking Methods for Visual Search , 2007, IEEE MultiMedia.