A Multimodal and Multilevel Ranking Scheme for Large-Scale Video Retrieval

A critical issue of large-scale multimedia retrieval is how to develop an effective framework for ranking the search results. This problem is particularly challenging for content-based video retrieval due to some issues such as short text queries, insufficient sample learning, fusion of multimodal contents, and large-scale learning with huge media data. In this paper, we propose a novel multimodal and multilevel (MMML) ranking framework to attack the challenging ranking problem of content-based video retrieval. We represent the video retrieval task by graphs and suggest a graph based semi-supervised ranking (SSR) scheme, which can learn with small samples effectively and integrate multimodal resources for ranking smoothly. To make the semi-supervised ranking solution practical for large-scale retrieval tasks, we propose a multilevel ranking framework that unifies several different ranking approaches in a cascade fashion. We have conducted empirical evaluations of our proposed solution for automatic search tasks on the benchmark testbed of TRECVID2005. The promising empirical results show that our ranking solutions are effective and very competitive with the state-of-the-art solutions in the TRECVID evaluations.

[1]  Rong Jin,et al.  A unified log-based relevance feedback scheme for image retrieval , 2006, IEEE Transactions on Knowledge and Data Engineering.

[2]  Rong Jin,et al.  Learning nonparametric kernel matrices from pairwise constraints , 2007, ICML '07.

[3]  Rong Yan,et al.  Multimedia Search with Pseudo-relevance Feedback , 2003, CIVR.

[4]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Jun Yang,et al.  CMU Informedia's TRECVID 2005 Skirmishes , 2005, TRECVID.

[6]  Edward Y. Chang,et al.  Multimodal concept-dependent active learning for image retrieval , 2004, MULTIMEDIA '04.

[7]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Luo Si,et al.  Collaborative image retrieval via regularized metric learning , 2006, Multimedia Systems.

[9]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[10]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[11]  Takeo Kanade,et al.  Informedia Digital Video Library , 1995, CACM.

[12]  Bo Zhang,et al.  Learning concepts from large scale imbalanced data sets using support cluster machines , 2006, MM '06.

[13]  Rong Yan,et al.  Learning query-class dependent weights in automatic video retrieval , 2004, MULTIMEDIA '04.

[14]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[15]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  Anil K. Jain,et al.  Shape-Based Retrieval: A Case Study With Trademark Image Databases , 1998, Pattern Recognit..

[18]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[19]  Rangasami L. Kashyap,et al.  Models for motion-based video indexing and retrieval , 2000, IEEE Trans. Image Process..

[20]  Mikhail Belkin,et al.  Manifold Regularization : A Geometric Framework for Learning from Examples , 2004 .

[21]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[22]  Rong Yan,et al.  Negative pseudo-relevance feedback in content-based video retrieval , 2003, MULTIMEDIA '03.

[23]  Michael R. Lyu,et al.  iVIEW: An Intelligent Video over InternEt and Wireless Access System , 2002 .

[24]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[25]  Edward Y. Chang,et al.  KDX: an indexer for support vector machines , 2006, IEEE Transactions on Knowledge and Data Engineering.

[26]  Rong Jin,et al.  Large-scale text categorization by batch mode active learning , 2006, WWW '06.

[27]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[28]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[29]  Wei Liu,et al.  Learning Distance Metrics with Contextual Constraints for Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Michael R. Lyu,et al.  A semi-supervised active learning framework for image retrieval , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[32]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[33]  Stephen E. Robertson,et al.  On relevance weights with little relevance information , 1997, SIGIR '97.

[34]  Changhu Wang,et al.  Image annotation refinement using random walk with restarts , 2006, MM '06.

[35]  Rong Yan,et al.  Merging storyboard strategies and automatic retrieval for improving interactive video search , 2007, CIVR '07.

[36]  Peter G. Doyle,et al.  Random Walks and Electric Networks: REFERENCES , 1987 .

[37]  Rong Yan,et al.  Efficient Margin-Based Rank Learning Algorithms for Information Retrieval , 2006, CIVR.

[38]  Shih-Fu Chang,et al.  Video search reranking via information bottleneck principle , 2006, MM '06.

[39]  Paul Over,et al.  TRECVID 2005 - An Overview , 2005, TRECVID.

[40]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.