Cortina: a system for large-scale, content-based web image retrieval

Recent advances in processing and networking capabilities of computers have led to an accumulation of immense amounts of multimedia data such as images. One of the largest repositories for such data is the World Wide Web (WWW). We present Cortina, a large-scale image retrieval system for the WWW. It handles over 3 million images to date. The system retrieves images based on visual features and collateral text. We show that a search process which consists of an initial query-by-keyword or query-by-image and followed by relevance feedback on the visual appearance of the results is possible for large-scale data sets. We also show that it is superior to the pure text retrieval commonly used in large-scale systems. Semantic relationships in the data are explored and exploited by data mining, and multiple feature spaces are included in the search process.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[3]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[4]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[5]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[6]  Padhraic Smyth,et al.  An Information Theoretic Approach to Rule Induction from Databases , 1992, IEEE Trans. Knowl. Data Eng..

[7]  Donna K. Harman,et al.  Overview of the First Text REtrieval Conference (TREC-1) , 1992, TREC.

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  B. Pinkerton,et al.  Finding What People Want : Experiences with the WebCrawler , 1994, WWW Spring 1994.

[10]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[11]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[12]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[13]  Michael J. Swain,et al.  WebSeer: An Image Search Engine for the World Wide Web , 1996 .

[14]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[15]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[16]  Jiawei Han,et al.  Mining MultiMedia Data , 1999 .

[17]  Thomas Hofmann,et al.  Statistical Models for Co-occurrence Data , 1998 .

[18]  B. S. Manjunath,et al.  A Texture Thesaurus for Browsing Large Aerial Photographs , 1998, J. Am. Soc. Inf. Sci..

[19]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Multimedia Systems.

[20]  Jitendra Malik,et al.  Blobworld: A System for Region-Based Image Indexing and Retrieval , 1999, VISUAL.

[21]  Thomas S. Huang,et al.  A novel relevance feedback technique in image retrieval , 1999, MULTIMEDIA '99.

[22]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[23]  Ingemar J. Cox,et al.  The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..

[24]  Thijs Westerveld,et al.  Image Retrieval: Content versus Context , 2000, RIAO.

[25]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[26]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[27]  B. S. Manjunath,et al.  An efficient color representation for image retrieval , 2001, IEEE Trans. Image Process..

[28]  B. S. Manjunath,et al.  Category-based image retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[29]  Gonzalo Navarro,et al.  A Probabilistic Spell for the Curse of Dimensionality , 2001, ALENEX.

[30]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[31]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[32]  B. S. Manjunath,et al.  Introduction to mpeg-7 , 2002 .

[33]  Jeremy G. Siek,et al.  The Boost Graph Library - User Guide and Reference Manual , 2001, C++ in-depth series.

[34]  Vittorio Castelli,et al.  Image Databases: Search and Retrieval of Digital Imagery , 2002 .

[35]  Kobus Barnard,et al.  Exploiting Text and Image Feature Co-occurrence Statistics in Large Datasets , 2003 .

[36]  Benjamin C. M. Fung,et al.  Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[37]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[38]  Sitaram Bhagavathy,et al.  Issues concerning dimensionality and similarity search , 2003, 3rd International Symposium on Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the.

[39]  Bart Goethals,et al.  Advances in Frequent Itemset Mining Implementations: Introduction to FIMI03 , 2003, FIMI.

[40]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[41]  B. S. Manjunath,et al.  Mining Image Datasets Using Perceptual Association Rules , 2003 .

[42]  Daniel Heesch,et al.  Performance boosting with three mouse clicks - Relevance feedback for CBIR , 2003 .

[43]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Konstantinos N. Plataniotis,et al.  Retrieval of images from artistic repositories using a decision fusion framework , 2004, IEEE Transactions on Image Processing.

[45]  B. S. Manjunath,et al.  Managing large-scale multimedia repositories , 2004 .

[46]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[47]  Lai-Man Po,et al.  MPEG-7 dominant color descriptor based relevance feedback using merged palette histogram , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.