Content-Based Image Indexing by Data Clustering and Inverse Document Frequency

In this paper we present an algorithm for creating and searching large image databases. Effective browsing and searching such collections of images based on their content is one of the most important challenges of computer science. In the presented algorithm, the process of inserting data to the database consists of several stages. In the first step interest points are generated from images by e.g. SIFT, SURF or PCA SIFT algorithms. The resulting huge number of key points is then reduced by data clustering, in our case by a novel, parameterless version of the mean shift algorithm. The reduction is achieved by subsequent operation on generated cluster centers. This algorithm has been adapted specifically for the presented method. Cluster centers are treated as terms and images as documents in the term frequency-inverse document frequency (TF-IDF) algorithm. TF-IDF algorithm allows to create an indexed image database and to fast retrieve desired images. The proposed approach is validated by numerical experiments on images with different content.

[1]  Krzysztof Sopyla,et al.  Ranking by K-Means Voting Algorithm for Similar Image Retrieval , 2012, ICAISC.

[2]  Jacek M. Zurada,et al.  Swarm and Evolutionary Computation , 2012, Lecture Notes in Computer Science.

[3]  김은이,et al.  Mean Shift Clustering을 이용한 영상 검색결과 개선 , 2009 .

[4]  Marcin Gabryel,et al.  Genetic Cost Optimization of the GI/M/1/N Finite-Buffer Queue with a Single Vacation Policy , 2013, ICAISC.

[5]  Marcin Gabryel,et al.  Creating Learning Sets for Control Systems Using an Evolutionary Method , 2012, ICAISC.

[6]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  Bruce A. Draper,et al.  Introduction to the Bag of Features Paradigm for Image Classification and Retrieval , 2011, ArXiv.

[8]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Remco C. Veltkamp,et al.  Content-based image retrieval systems: A survey , 2000 .

[10]  K. Cpałka On evolutionary designing and learning of flexible neuro-fuzzy structures for nonlinear classification , 2009 .

[11]  Marcin Gabryel,et al.  Object Detection by Simple Fuzzy Classifiers Generated by Boosting , 2013, ICAISC.

[12]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[13]  Jonathon S. Hare,et al.  Efficient clustering and quantisation of SIFT features: exploiting characteristics of the SIFT descriptor and interest region detectors under image inversion , 2011, ICMR '11.

[14]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[16]  Krzysztof Cpalka,et al.  A New Method to Construct of Interpretable Models of Dynamic Systems , 2012, ICAISC.

[17]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[21]  Cuixian Chen,et al.  Feature selection for improved automatic gender classification , 2011, 2011 IEEE Workshop on Computational Intelligence in Biometrics and Identity Management (CIBIM).

[22]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .