Crawling, Indexing, and Similarity Searching Images on the Web

In this paper, we report on our experience in building an experimental similarity search system on a test collection of more than 50 million images, to show the possibility to scale Content-based Image Retrieval (CBIR) systems towards the Web size. First, we had to tackle the non-trivial process of image crawling and descriptive feature extraction, performed by using the European EGEE computer GRID, building a test collection, the first of such scale, that will be opened to the research community for experiments and comparisons. Then, we had to develop indexing and searching mechanisms which can scale up to these volumes and answer similarity queries in real-time. The results of our experiments are very encouraging for future applications.

[1]  David Novak,et al.  MESSIF: Metric Similarity Search Implementation Framework , 2007, DELOS.

[2]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[3]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[4]  Ricardo A. Baeza-Yates,et al.  Content-Based Image Retrieval and Characterization on Specific Web Collections , 2004, CIVR.

[5]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[6]  David Novak,et al.  Web-scale system for image similarity search: When the dreams are coming true , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[7]  Pavel Zezula,et al.  Region proximity in metric spaces and its use for approximate similarity search , 2003, TOIS.

[8]  Václav Snásel,et al.  PM-tree: Pivoting Metric Tree for Similarity Search in Multimedia Databases , 2004, ADBIS.

[9]  Harry W. Agius MPEG-7: Multimedia Content Description Interface , 2008, Encyclopedia of Multimedia.

[10]  David Novak,et al.  On scalability of the similarity search in the world of peers , 2006, InfoScale '06.

[11]  David Novak,et al.  M-Chord: a scalable distributed similarity search structure , 2006, InfoScale '06.

[12]  Wang Ru,et al.  Multimedia content description interface——MPEG-7 , 2003 .

[13]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[14]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.