Extreme video retrieval: joint maximization of human and computer performance

We present an efficient system for video search that maximizes the use of human bandwidth, while at the same time exploiting the machine's ability to learn in real-time from user selected relevant video clips. The system exploits the human capability for rapidly scanning imagery augmenting it with an active learning loop, which attempts to always present the most relevant material based on the current information. Two versions of the human interface were evaluated, one with variable page sizes and manual paging, the other with a fixed page size and automatic paging. Both require absolute attention and focus of the user for optimal performance. In either case, as users search and find relevant results, the system can invisibly re-rank its previous best guesses using a number of knowledge sources, such as image similarity, text similarity, and temporal proximity. Experimental evidence shows a significant improvement using the combined extremes of human and machine power over either approach alone.

[1]  Shih-Fu Chang,et al.  Multimedia access and retrieval: the state of the art and future directions (panel session). , 1999, ACM Multimedia.

[2]  Alan F. Smeaton,et al.  Designing the User Interface for the Físchlár Digital Video Library , 2006, J. Digit. Inf..

[3]  Takeo Kanade,et al.  Probabilistic modeling of local appearance and spatial relationships for object recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[4]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[5]  Robert Spence Rapid, Serial and Visual: A Presentation Technique with Potential , 2002 .

[6]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[7]  Wessel Kraaij,et al.  TRECVID 2005-An Introduction , 2005 .

[8]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[9]  Lei Wang,et al.  Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  A.W.M. Smeulders,et al.  Semantic Video Search , 2007, 14th International Conference of Image Analysis and Processing - Workshops (ICIAPW 2007).

[11]  Matthieu Cord,et al.  RETIN AL: an active learning strategy for image category retrieval , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[12]  Alexander G. Hauptmann,et al.  Successful approaches in the TREC video retrieval evaluations , 2004, MULTIMEDIA '04.

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[14]  Rong Yan,et al.  Learning query-class dependent weights in automatic video retrieval , 2004, MULTIMEDIA '04.

[15]  Mark Derthick Interfaces for palmtop image search , 2002, JCDL '02.

[16]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[17]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[18]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[19]  G. P. Nguyen,et al.  The MediaMill TRECVID 2005 Semantic Video Search Engine (Draft Version). , 2005 .

[20]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[21]  Edward Y. Chang,et al.  Support Vector Machine Concept-Dependent Active Learning for Image Retrieval , 2005 .

[22]  Howard D. Wactlar,et al.  Putting active learning into multimedia applications: dynamic definition and refinement of concept classifiers , 2005, MULTIMEDIA '05.

[23]  Behzad Shahraray,et al.  Multimedia access and retrieval (panel session): the state of the art and future directions , 1999, MULTIMEDIA '99.

[24]  Rong Yan,et al.  Efficient Margin-Based Rank Learning Algorithms for Information Retrieval , 2006, CIVR.

[25]  Alexander G. Hauptmann,et al.  Searching for a specific person in broadcast news video , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  John R. Smith,et al.  Active learning for simultaneous annotation of multiple binary semantic concepts [video content analysis] , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).