Effective Image Mining by Representing Color Histograms as Time Series

Due to the wide spread of digital libraries, digital cameras, and the increase access to WWW by individuals, the number of digital images that exist pose a great challenge. Easy access to such collections requires an index structure to facilitate random access to individual images and ease navigation of these images. As these images are not annotated or associated with descriptions, existing systems represent the images by their extracted low level features. In this paper, we demonstrate two image mining tasks, namely image classification and image clustering, which are preliminary steps in facilitating indexing and navigation. These tasks are based on the extraction of color distributions of images. Then, these color distributions are represented as time series. To make the representation more effective and efficient for the data mining tasks, we have chosen to represent the time series by a new representation called SAX (Symbolic Aggregate approXimation) [14]. SAX based representation is very effective because it reduces the dimensionality and lower bounds the distance measure. We demonstrate by our experiment the feasibility of our approach.

[1]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[2]  Chris H. Q. Ding,et al.  Adaptive dimension reduction for clustering high dimensional data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Anil K. Jain,et al.  Incremental learning for Bayesian classification of images , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[4]  Arnold W. M. Smeulders,et al.  Classification of images on the Internet by visual and textual information , 1999, Electronic Imaging.

[5]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[6]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Multimedia Systems.

[7]  Ben Bradshaw,et al.  Semantic based image retrieval: a probabilistic approach , 2000, ACM Multimedia.

[8]  Zaher Al Aghbari,et al.  Hill-manipulation: An effective algorithm for color image segmentation , 2006, Image Vis. Comput..

[9]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[10]  Ji Zhang,et al.  Image Mining: Issues, Frameworks and Techniques , 2001, MDM/KDD.

[11]  Joo-Hwee Lim,et al.  Symbolic photograph content-based retrieval , 2002, CIKM '02.

[12]  Anil K. Jain,et al.  Content-based hierarchical classification of vacation images , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[13]  Jiawei Han,et al.  Mining MultiMedia Data , 1999 .

[14]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[15]  Joo-Hwee Lim Explicit query formulation with visual keywords , 2000, ACM Multimedia.

[16]  Thomas S. Huang,et al.  Supporting content-based queries over images in MARS , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.