Episode-Constrained Cross-Validation in Video Concept Retrieval

Whereas video tells a narrative by a composition of shots, current video retrieval methods focus mainly on single shots. In retrieval performance estimation, similar shots in a narrative may result in performance overestimation. We propose an episode-based version of cross-validation leading up to 14% classification improvement over shot-based cross-validation.

[1]  Sheng Gao,et al.  Improving Semantic Concept Detection Through Optimizing Ranking Function , 2007, IEEE Transactions on Multimedia.

[2]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jun Yang,et al.  Exploring temporal consistency for video analysis and retrieval , 2006, MIR '06.

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Qi Tian,et al.  A unified framework for semantic shot classification in sports video , 2002, IEEE Transactions on Multimedia.

[6]  Cor J. Veenman,et al.  The influence of cross-validation on video classification performance , 2006, MM '06.

[7]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[8]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[9]  Thierry Pun,et al.  Performance evaluation in content-based image retrieval: overview and proposals , 2001, Pattern Recognit. Lett..

[10]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[11]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[12]  Yanjun Qi,et al.  Supervised classification for video shot segmentation , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[13]  Cor J. Veenman,et al.  Robust Scene Categorization by Learning Image Statistics in Context , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[14]  Vijay V. Raghavan,et al.  A critical investigation of recall and precision as measures of retrieval system performance , 1989, TOIS.

[15]  Maarten de Rijke,et al.  Exploiting redundancy in cross-channel video retrieval , 2007, MIR '07.

[16]  Lie Lu,et al.  A robust audio classification and segmentation method , 2001, MULTIMEDIA '01.

[17]  Timo Ojala,et al.  On the significance of cluster-temporal browsing for generic video retrieval: a statistical analysis , 2006, MM '06.

[18]  Marcel Worring,et al.  Systematic evaluation of logical story unit segmentation , 2002, IEEE Trans. Multim..

[19]  Emine Yilmaz,et al.  Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.

[20]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[21]  Jun Yang,et al.  Finding Person X: Correlating Names with Visual Appearances , 2004, CIVR.

[22]  Emine Yilmaz,et al.  A geometric interpretation of r-precision and its correlation with average precision , 2005, SIGIR '05.

[23]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.