Searching near-replicas of images via clustering

Internet piracy has been one of the major concerns for Web publishing. In this study we present a system, RIME, that we have prototyped for detecting unauthorized image copying on the WWW. To speed up the copy detection, RIME uses a new clustering/hashing approach that first clusters similar images on adjacent disk cylinders and then builds indexes to access the clusters made in this way. Searching for the replicas of an image often takes just one IO to loop up the location of the cluster containing similar objects and one sequential file IO to read in this cluster. Our experimental results show that RIME can detect images copies both more efficiently and effectively than the traditional content- based image retrieval systems that use tree-like structures to index images. In addition, RIME copes well with image format conversion, resampling, requantization and geometric transformation.

[1]  Amarnath Gupta,et al.  Visual information retrieval , 1997, CACM.

[2]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[3]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[4]  Edward Y. Chang,et al.  Clindex: Clustering for Similarity Queries in High-Dimensional Spaces. , 1999 .

[5]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[6]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[7]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[8]  Edward Y. Chang,et al.  RIME: a replicated image detector for the World Wide Web , 1998, Other Conferences.

[9]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[10]  Leonidas J. Guibas,et al.  Adaptive Color-Image Embeddings for Database Navigation , 1998, ACCV.

[11]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[12]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[13]  R. Ng,et al.  Eecient and Eeective Clustering Methods for Spatial Data Mining , 1994 .

[14]  Ingemar J. Cox,et al.  Secure spread spectrum watermarking for multimedia , 1997, IEEE Trans. Image Process..

[15]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[16]  Teuvo Kohonen,et al.  Self-Organizing Maps, Second Edition , 1997, Springer Series in Information Sciences.

[17]  Jennifer Widom,et al.  Database System Implementation , 2000 .

[18]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[19]  Raj Jain,et al.  Algorithms and strategies for similarity retrieval , 1996 .

[20]  Hector Garcia-Molina,et al.  Finding near-replicas of documents on the Web , 1999 .

[21]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[22]  Hector Garcia-Molina,et al.  Safeguarding and charging for information on the Internet , 1998, Proceedings 14th International Conference on Data Engineering.

[23]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[24]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[25]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[26]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[27]  Hector Garcia-Molina,et al.  Copy detection mechanisms for digital documents , 1995, SIGMOD '95.

[28]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[29]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.