DynDex: a dynamic and non-metric space indexer

To date, almost all research work in the Content-Based Image Retrieval (CBIR) community has used Minkowski-like functions to measure similarity between images. In this paper, we first present a non-metric distance function, dynamic partial function (DPF), which works significantly better than Minkowski-like functions for measuring perceptual similarity; and we explain DPF's link to similarity theories in cognitive science. We then propose DynDex, an indexing method that deals with both the dynamic and non-metric aspects of the distance function. DynDex employs statistical methods including distance-based classification and bagging to enable efficient indexing with DPF. In addition to its efficiency for conducting similarity searches in very high-dimensional spaces, we show that DynDex remains effective when features are weighted dynamically for supporting personalized searches.

[1]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[2]  Daphna Weinshall,et al.  Classification with Nonmetric Distances: Image Retrieval and Class Representation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Kriengkrai Porkaew,et al.  Query refinement for multimedia similarity retrieval in MARS , 1999, MULTIMEDIA '99.

[4]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[5]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[6]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification , 1995, SSD.

[7]  Thomas S. Huang,et al.  Content-based image retrieval with relevance feedback in MARS , 1997, Proceedings of International Conference on Image Processing.

[8]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[9]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[10]  Christos Faloutsos,et al.  MindReader: Querying Databases Through Multiple Examples , 1998, VLDB.

[11]  D. Gentner,et al.  Respects for similarity , 1993 .

[12]  Edward Y. Chang,et al.  Clustering for Approximate Similarity Search in High-Dimensional Spaces , 2002, IEEE Trans. Knowl. Data Eng..

[13]  Z. Meral Özsoyoglu,et al.  Distance-based indexing for high-dimensional metric spaces , 1997, SIGMOD '97.

[14]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[15]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[16]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[17]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[18]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[19]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[20]  U. Fayyad,et al.  Scaling EM (Expectation Maximization) Clustering to Large Databases , 1998 .

[21]  Sharad Mehrotra,et al.  Query reformulation for content based multimedia retrieval in MARS , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[22]  A. Tversky Features of Similarity , 1977 .

[23]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[24]  Thomas S. Huang,et al.  Comparing discriminating transformations and SVM for learning during multimedia retrieval , 2001, MULTIMEDIA '01.

[25]  D. Medin,et al.  The role of theories in conceptual coherence. , 1985, Psychological review.

[26]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[27]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[28]  Edward Y. Chang,et al.  Discovery of a perceptual distance function for measuring image similarity , 2003, Multimedia Systems.

[29]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.