A bootstrapping framework for annotating and retrieving WWW images

Most current image retrieval systems and commercial search engines use mainly text annotations to index and retrieve WWW images. This research explores the use of machine learning approaches to automatically annotate WWW images based on a predefined list of concepts by fusing evidences from image contents and their associated HTML text. One major practical limitation of employing supervised machine learning approaches is that for effective learning, a large set of labeled training samples is needed. This is tedious and severely impedes the practical development of effective search techniques for WWW images, which are dynamic and fast-changing. As web-based images possess both intrinsic visual contents and text annotations, they provide a strong basis to bootstrap the learning process by adopting a co-training approach involving classifiers based on two orthogonal set of features -- visual and text. The idea of co-training is to start from a small set of labeled training samples, and successively annotate a larger set of unlabeled samples using the two orthogonal classifiers. We carry out experiments using a set of over 5,000 images acquired from the Web. We explore the use of different combinations of HTML text and visual representations. We find that our bootstrapping approach can achieve a performance comparable to that of the supervised learning approach with an F1 measure of over 54%. At the same time, it offers the added advantage of requiring only a small initial set of training samples.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[3]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[4]  Stéphane Mallat,et al.  Matching pursuit of images , 1995, Proceedings., International Conference on Image Processing.

[5]  Michael Unser,et al.  Texture classification and segmentation using wavelet frames , 1995, IEEE Trans. Image Process..

[6]  Stéphane Mallat,et al.  Matching pursuit of images , 1995, Proceedings., International Conference on Image Processing.

[7]  Rangasami L. Kashyap,et al.  Adaptive multiresolution image coding with matching and basis pursuits , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[8]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Michael Shneier,et al.  Exploiting the JPEG Compression Scheme for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Mark D. Dunlop,et al.  Image retrieval by hypertext links , 1997, SIGIR '97.

[11]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[12]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[13]  Joe F. Zhou,et al.  Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, : 21-22 June 1999, University of Maryland, College Park, MD, USA , 1999 .

[14]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[15]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[16]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[17]  Jitendra Malik,et al.  Blobworld: A System for Region-Based Image Indexing and Retrieval , 1999, VISUAL.

[18]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[20]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[21]  Beng Chin Ooi,et al.  Giving meanings to WWW images , 2000, MM 2000.

[22]  Clement T. Yu,et al.  Multiple evidence combination in image retrieval: Diogenes searches for people on the Web , 2000, SIGIR '00.

[23]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[24]  Claire Cardie,et al.  Limitations of Co-Training for Natural Language Learning from Large Datasets , 2001, EMNLP.

[25]  Tsuhan Chen,et al.  An active learning framework for content-based information retrieval , 2002, IEEE Trans. Multim..

[26]  Rainer Lienhart,et al.  Classifying images on the web automatically , 2002, J. Electronic Imaging.

[27]  James Ze Wang,et al.  Learning-based linguistic indexing of pictures with 2--d MHMMs , 2002, MULTIMEDIA '02.

[28]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[29]  Tat-Seng Chua,et al.  A bootstrapping approach to annotating large image collection , 2003, MIR '03.

[30]  Qi Tian,et al.  TREC 2003 Video Retrieval and Story Segmentation Task at NUS PRIS , 2003 .

[31]  Tobun Dorbin Ng,et al.  Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.

[32]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[33]  Keiji Yanai,et al.  Generic image classification using visual knowledge on the web , 2003, ACM Multimedia.

[34]  Chin-Hui Lee,et al.  An Adaptive Image Content Representation and Segmentation Approach to Automatic Image Annotation , 2004, CIVR.