Characterization of color distributions with histograms and kernel density estimators

Color is widely used for content-based image retrieval. In these applications the color properties of an image are characterized by the probability distribution of the colors in the image. These probability distributions are very often estimated by histograms although the histograms have many drawbacks compared to other estimators such as kernel density methods. In this paper we investigate whether using kernel density estimators instead of histograms could give better descriptors of color images. Experiments using these descriptors to estimate the parameters of the underlying color distribution and in color based image retrieval (CBIR) applications were carried out in which the MPEG7 database of 5466 color images with 50 standard queries are used as the benchmark. Noisy images are also generated and put into the CBIR application to test the robustness of the descriptors against the noise. The results of our experiments show that good density estimators are not necessarily good descriptors for CBIR applications. We found that the histograms perform better than kernel based methods when used as descriptors for CBIR applications. In the second part of the paper, optimal values of important parameters in the construction of these descriptors, particularly the smoothing parameters or the bandwidth of the estimators, are discussed. Our experiments show that using over-smoothed bandwidth gives better retrieval performance.

[1]  Luc Devroye,et al.  Nonparametric Density Estimation , 1985 .

[2]  Yuichiro Kanazawa,et al.  Hellinger distance and Akaike's information criterion for the histogram , 1993 .

[3]  D. W. Scott Averaged Shifted Histograms: Effective Nonparametric Density Estimators in Several Dimensions , 1985 .

[4]  D. Freedman,et al.  On the histogram as a density estimator:L2 theory , 1981 .

[5]  T. Gevers Robust histogram construction from color invariants , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[6]  Reiner Lenz,et al.  PCA-based representation of color distributions for color-based image retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[7]  H. Akaike A new look at the statistical model identification , 1974 .

[8]  D. W. Scott On optimal and data based histograms , 1979 .

[9]  M. Wand Data-Based Choice of Histogram Bin Width , 1997 .

[10]  M. Rudemo Empirical Choice of Histograms and Kernel Density Estimators , 1982 .

[11]  Roberto Brunelli,et al.  Histograms analysis for image retrieval , 2001, Pattern Recognit..

[12]  Jeffrey S. Simonoff,et al.  Measuring the stability of histogram appearance when the anchor position is changed , 1997 .

[13]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[14]  Michael J. Swain,et al.  The capacity of color histogram indexing , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Mike Baxter,et al.  Sample Size and Related Issues in the Analysis of Lead Isotope Data , 2000 .

[16]  Herbert A. Sturges,et al.  The Choice of a Class Interval , 1926 .