Enhanced perceptual distance functions and indexing for image replica recognition

The proliferation of digital images and the widespread distribution of digital data that has been made possible by the Internet has increased problems associated with copyright infringement on digital images. Watermarking schemes have been proposed to safeguard copyrighted images, but watermarks are vulnerable to image processing and geometric distortions and may not be very effective. Thus, the content-based detection of pirated images has become an important application. In this paper, we discuss two important aspects of such a replica detection system: distance functions for similarity measurement and scalability. We extend our previous work on perceptual distance functions, which proposed the Dynamic Partial Function (DPF), and present enhanced techniques that overcome the limitations of DPF. These techniques include the Thresholding, Sampling, and Weighting schemes. Experimental evaluations show superior performance compared to DPF and other distance functions. We then address the issue of using these perceptual distance functions to efficiently detect replicas in large image data sets. The problem of indexing is made challenging by the high-dimensionality and the nonmetric nature of the distance functions. We propose using Locality Sensitive Hashing (LSH) to index images while using the above perceptual distance functions and demonstrate good performance through empirical studies on a very large database of diverse images.

[1]  Edward Y. Chang,et al.  Learning image query concepts via intelligent sampling , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[2]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[3]  Edward Y. Chang,et al.  DynDex: a dynamic and non-metric space indexer , 2002, MULTIMEDIA '02.

[4]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[5]  Edward Y. Chang,et al.  Enhancing DPF for near-replica image recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[7]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[8]  Yan Ke,et al.  An efficient parts-based near-duplicate and sub-image retrieval system , 2004, MULTIMEDIA '04.

[9]  B. S. Manjunath,et al.  A texture descriptor for browsing and similarity retrieval , 2000, Signal Process. Image Commun..

[10]  B. Reljin,et al.  Adaptive Content-Based Image Retrieval with Relevance Feedback , 2005, EUROCON 2005 - The International Conference on "Computer as a Tool".

[11]  Thomas S. Huang,et al.  Content-based image retrieval with relevance feedback in MARS , 1997, Proceedings of International Conference on Image Processing.

[12]  H. Bastian Sensation and Perception.—I , 1869, Nature.

[13]  Edward Y. Chang,et al.  Confidence-based dynamic ensemble for image annotation and semantics discovery , 2003, MULTIMEDIA '03.

[14]  Joachim M. Buhmann,et al.  Empirical evaluation of dissimilarity measures for color and texture , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[15]  James Ze Wang,et al.  Content-based image indexing and searching using Daubechies' wavelets , 1998, International Journal on Digital Libraries.

[16]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[17]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[18]  Jeremy Buhler,et al.  Efficient large-scale sequence comparison by locality-sensitive hashing , 2001, Bioinform..

[19]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[20]  Michael S. Lew,et al.  Principles of Visual Information Retrieval , 2001, Advances in Pattern Recognition.

[21]  John R. Smith,et al.  Spatial and feature normalization for content-based retrieval , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[22]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[23]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[24]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[25]  Yixin Chen,et al.  A Region-Based Fuzzy Feature Matching Approach to Content-Based Image Retrieval , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  D. Gentner,et al.  Respects for similarity , 1993 .

[27]  A. Tversky Features of Similarity , 1977 .

[28]  Rosalind W. Picard A Society of Models for Video and Image Libraries , 1996, IBM Syst. J..

[29]  Amit Jain,et al.  A multiscale representation including opponent color features for texture recognition , 1998, IEEE Trans. Image Process..

[30]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[31]  Edward Y. Chang,et al.  Discovery of a perceptual distance function for measuring image similarity , 2003, Multimedia Systems.

[32]  Robert L. Goldstone Similarity, interactive activation, and mapping , 1994 .

[33]  Cheng Yang MACS: music audio characteristic sequence indexing for similarity retrieval , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[34]  Edward Y. Chang,et al.  Clindex: Clustering for Similarity Queries in High-Dimensional Spaces. , 1999 .

[35]  Hector Garcia-Molina,et al.  Safeguarding and charging for information on the Internet , 1998, Proceedings 14th International Conference on Data Engineering.