Efficiency issues related to probability density function comparison

The CANDID project (comparison algorithm for navigating digital image databases) employs probability density functions (PDFs) of localized feature information to represent the content of an image for search and retrieval purposes. A similarity measure between PDFs is used to identify database images that are similar to a user-provided query image. Unfortunately, signature comparison involving PDFs is a very time-consuming operation. In this paper, we look into some efficiency considerations when working with PDFs. Since PDFs can take on many forms, we look into tradeoffs between accurate representation and efficiency of manipulation for several data sets. In particular, we typically represent each PDF as a Gaussian mixture (e.g. as a weighted sum of Gaussian kernels) in the feature space. We find that by constraining all Gaussian kernels to have principal axes that are aligned to the natural axes of the feature space, computations involving these PDFs are simplified. We can also constrain the Gaussian kernels to be hyperspherical rather than hyperellipsoidal, simplifying computations even further, and yielding an order of magnitude speedup in signature comparison. This paper illustrates the tradeoffs encountered when using these constraints.

[1]  Harpreet Sawhney,et al.  Efficient color histogram indexing , 1994, Proceedings of 1st International Conference on Image Processing.

[2]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[3]  Dragutin Petkovic,et al.  Automatic and semiautomatic methods for image annotation and retrieval in query by image content (QBIC) , 1995, Electronic Imaging.

[4]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[5]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[8]  Patrick M. Kelly,et al.  An adaptive algorithm for modifying hyperellipsoidal decision surfaces , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[9]  Roy E. Kimbrell,et al.  Searching for text? Send an N-gram] , 1988 .

[10]  Patrick M Kelly An Algorithm for Merging Hyperellipsoidal Clusters , 1994 .

[11]  James C. French,et al.  Using the triangle inequality to reduce the number of comparisons required for similarity-based retrieval , 1996, Electronic Imaging.