A Survey On: Content Based Image Retrieval Systems Using Clustering Techniques For Large Data sets

Content-based image retrieval (CBIR) is a new but widely adopted method for finding images from vast and unannotated image databases. As the network and development of multimedia technologies are becoming more popular, users are not satisfied with the traditional information retrieval techniques. So nowadays the content based image retrieval (CBIR) are becoming a source of exact and fast retrieval. In recent years, a variety of techniques have been developed to improve the performance of CBIR. Data clustering is an unsupervised method for extraction hidden pattern from huge data sets. With large data sets, there is possibility of high dimensionality. Having both accuracy and efficiency for high dimensional data sets with enormous number of samples is a challenging arena. In this paper the clustering techniques are discussed and analysed. Also, we propose a method HDK that uses more than one clustering technique to improve the performance of CBIR.This method makes use of hierachical and divide and conquer KMeans clustering technique with equivalency and compatible relation concepts to improve the performance of the K-Means for using in high dimensional datasets. It also introduced the feature like color, texture and shape for accurate and effective retrieval system.

[1]  Madjid Khalilian,et al.  K-Means Divide and Conquer Clustering , 2009, 2009 International Conference on Computer and Automation Engineering.

[2]  Mohan S. Kankanhalli,et al.  Cluster-based color matching for image retrieval , 1996, Pattern Recognit..

[3]  Swarup Medasani,et al.  Content-based image retrieval based on a fuzzy approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[4]  Cordelia Schmid,et al.  High-dimensional data clustering , 2006, Comput. Stat. Data Anal..

[5]  Hong Qin,et al.  A Study of Order-Based Block Color Feature Image Retrieval Compared with Cumulative Color Histogram Method , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[6]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[7]  Thomas S. Huang,et al.  Content-based image retrieval with relevance feedback in MARS , 1997, Proceedings of International Conference on Image Processing.

[8]  Michael R. Lyu,et al.  A novel log-based relevance feedback technique in content-based image retrieval , 2004, MULTIMEDIA '04.

[9]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Yuan Yan Tang,et al.  Sequential combination methods for data clustering analysis , 2002, Journal of Computer Science and Technology.

[11]  Frank Harary,et al.  Graph Theory , 2016 .

[12]  Jilin Li,et al.  Local patterns constrained image histograms for image retrieval , 2008, 2008 15th IEEE International Conference on Image Processing.

[13]  Harsha S. Nagesh,et al.  High Performance Subspace Clustering for Massive Data Sets , 1999 .

[14]  C. Loganathan,et al.  A Survey on Image Segmentation through Clustering Algorithm , 2013 .

[15]  Amit Kumar Das,et al.  CBIR using perception based texture and colour measures , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[16]  Kai-Kuang Ma,et al.  Fuzzy color histogram and its use in color image retrieval , 2002, IEEE Trans. Image Process..

[17]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[18]  Nozha Boujemaa,et al.  Embedding fuzzy logic in content based image retrieval , 2000, PeachFuzz 2000. 19th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.00TH8500).

[19]  Henri Maître,et al.  A Method of Clustering Combination Applied to Satellite Image Analysis , 2007, 14th International Conference on Image Analysis and Processing (ICIAP 2007).

[20]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[21]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[22]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Siddhivinayak Kulkarni,et al.  Natural Language based Fuzzy Queries and Fuzzy Mapping of Feature Database for Image Retrieval , 2010 .

[24]  Nozha Boujemaa,et al.  Using Fuzzy Histograms and Distances for Color Image Retrieval , 2000 .

[25]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[26]  Yixin Chen,et al.  A Region-Based Fuzzy Feature Matching Approach to Content-Based Image Retrieval , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Zhiyong Zeng,et al.  An efficient and effective image representation for region-based image retrieval , 2009, ICIS.

[28]  Ingemar J. Cox,et al.  The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..

[29]  Aoying Zhou,et al.  An adaptive and dynamic dimensionality reduction method for high-dimensional indexing , 2007, The VLDB Journal.

[30]  Brian Everitt,et al.  Cluster analysis , 1974 .

[31]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[32]  Zhou Yong,et al.  A Novel Clustering Algorithm Based on Hierarchical and K-means Clustering , 2006, 2007 Chinese Control Conference.

[33]  Michal Linial,et al.  Clustering Algorithms Optimizer: A Framework for Large Datasets , 2007, ISBRA.

[34]  A. Guttman,et al.  A Dynamic Index Structure for Spatial Searching , 1984, SIGMOD 1984.

[35]  Pierre Hansen,et al.  Cluster analysis and mathematical programming , 1997, Math. Program..

[36]  B. S. Adiga,et al.  A Universal Model for Content-Based Image Retrieval , 2008 .

[37]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[38]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[39]  Huiyu Zhou,et al.  Content Based Image Retrieval and Clustering: A Brief Survey , 2009 .

[40]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[41]  Dina E. Melas,et al.  Double Markov random fields and Bayesian image segmentation , 2002, IEEE Trans. Signal Process..

[42]  Sariel Har-Peled,et al.  Coresets for $k$-Means and $k$-Median Clustering and their Applications , 2018, STOC 2004.

[43]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[44]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[45]  Jeffrey L. Solka,et al.  Text Data Mining: Theory and Methods , 2008, ArXiv.

[46]  Ling Guan,et al.  Content-based image retrieval via distributed databases , 2008, CIVR '08.

[47]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[48]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[49]  Hui Zhang,et al.  Localized Content-Based Image Retrieval , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[51]  Bhavani M. Thuraisingham,et al.  Semantic Web for Content Based Video Retrieval , 2009, 2009 IEEE International Conference on Semantic Computing.

[52]  Deok-Hwan Kim,et al.  QCluster: relevance feedback using adaptive clustering for content-based image retrieval , 2003, SIGMOD '03.

[53]  Tatik Maftukhah,et al.  Fuzzy Relevance Feedback in Image Retrieval for Color Feature Using Query Vector Modification Method , 2010, J. Adv. Comput. Intell. Intell. Informatics.

[54]  Peter A. Beling,et al.  Localized Content Based Image Retrieval with Self-Taught Multiple Instance Learning , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[55]  Neamat El Gayar,et al.  A new approach in content-based image retrieval using fuzzy , 2009, Telecommun. Syst..

[56]  Yi Zhang,et al.  Entropy-based subspace clustering for mining numerical data , 1999, KDD '99.

[57]  George Karypis,et al.  TR 99-007 A Hierarchical Clustering Algorithm Using Dynamic Modeling , 2004 .

[58]  Malay Kumar Kundu,et al.  Edge based features for content based image retrieval , 2003, Pattern Recognit..

[59]  Nabil H. Mustafa,et al.  k-means projective clustering , 2004, PODS.

[60]  Alberto Del Bimbo,et al.  Retrieval by Shape Similarity with Perceptual Distance and Effective Indexing , 2000, IEEE Trans. Multim..

[61]  Shahram Latifi,et al.  Image Segmentation Using Ncut in the Wavelet Domain , 2006, Int. J. Image Graph..

[62]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[63]  Jayant Mishra,et al.  An Unsupervised Cluster-based Image Retrieval Algorithm using Relevance Feedback , 2011 .

[64]  Giorgio Giacinto,et al.  A nearest-neighbor approach to relevance feedback in content based image retrieval , 2007, CIVR '07.

[65]  Kilian Stoffel,et al.  Parallel k/h-Means Clustering for Large Data Sets , 1999, Euro-Par.

[66]  Chris H. Q. Ding,et al.  Simultaneous tensor subspace selection and clustering: the equivalence of high order svd and k-means clustering , 2008, KDD.

[67]  Mathias Lux,et al.  Img(Rummager): An Interactive Content Based Image Retrieval System , 2009, 2009 Second International Workshop on Similarity Search and Applications.

[68]  A. Guttmma,et al.  R-trees: a dynamic index structure for spatial searching , 1984 .

[69]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[70]  Myoung-Ho Kim,et al.  FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting , 2004, Inf. Softw. Technol..

[71]  Wang Xiaoling A Novel Circular Ring Histogram for Content-Based Image Retrieval , 2009, 2009 First International Workshop on Education Technology and Computer Science.

[72]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[73]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[74]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .

[75]  Bhabatosh Chanda,et al.  CBIR using perception based texture and colour measures , 2004, ICPR 2004.

[76]  Yiannis S. Boutalis,et al.  img(Anaktisi): A Web Content Based Image Retrieval System , 2009, 2009 Second International Workshop on Similarity Search and Applications.

[77]  Michael R. Lyu,et al.  Group-based relevance feedback with support vector machine ensembles , 2004, ICPR 2004.

[78]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[79]  Serge J. Belongie,et al.  Region-based image querying , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[80]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[81]  Dina Q. Goldin,et al.  Generating fuzzy semantic metadata describing spatial relations from images using the R-histogram , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[82]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[83]  Chunguang Zhou,et al.  Image retrieval using multi-granularity color features , 2008, 2008 International Conference on Audio, Language and Image Processing.

[84]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[85]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[86]  Elias Dahlhaus,et al.  Parallel Algorithms for Hierarchical Clustering and Applications to Split Decomposition and Parity Graph Recognition , 2000, J. Algorithms.

[87]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[88]  Michael R. Lyu,et al.  Group-based relevance feedback with support vector machine ensembles , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[89]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[90]  Henri Maître,et al.  A Method of Clustering Combination Applied to Satellite Image Analysis , 2007, 14th International Conference on Image Analysis and Processing (ICIAP 2007).

[91]  B. Jaumard,et al.  Cluster Analysis and Mathematical Programming , 2003 .

[92]  Geoffrey H. Ball,et al.  ISODATA, A NOVEL METHOD OF DATA ANALYSIS AND PATTERN CLASSIFICATION , 1965 .

[93]  Joachim M. Buhmann,et al.  Combining partitions by probabilistic label aggregation , 2005, KDD '05.

[94]  P. Sharma,et al.  Content based image retrieval using a neuro-fuzzy technique , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[95]  Malay Kumar Kundu,et al.  Content based image retrieval with fuzzy geometrical features , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[96]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[97]  Yanchun Zhang,et al.  An overview of content-based image retrieval techniques , 2004, 18th International Conference on Advanced Information Networking and Applications, 2004. AINA 2004..

[98]  D. Mitchell Wilkes,et al.  A Divide-and-Conquer Approach for Minimum Spanning Tree-Based Clustering , 2009, IEEE Transactions on Knowledge and Data Engineering.

[99]  Jianhong Wu,et al.  Subspace clustering for high dimensional categorical data , 2004, SKDD.

[100]  Rolf Niedermeier,et al.  Graph-Modeled Data Clustering: Fixed-Parameter Algorithms for Clique Generation , 2003, CIAC.

[101]  Greg Hamerly,et al.  Making k-means Even Faster , 2010, SDM.

[102]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[103]  Thomas S. Huang,et al.  Relevance feedback techniques in interactive content-based image retrieval , 1997, Electronic Imaging.

[104]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[105]  Yuntao Qian,et al.  Clustering combination method , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[106]  Roded Sharan,et al.  Cluster Graph Modification Problems , 2002, WG.

[107]  Anca L. Ralescu,et al.  Fuzzy hamming distance in a content-based image retrieval system , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[108]  Vipin Kumar,et al.  Discovery of climate indices using clustering , 2003, KDD '03.

[109]  Muhammad Ikram,et al.  Image Retrieval in Multimedia Databases: A Survey , 2009, 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[110]  GanGuojun,et al.  Subspace clustering for high dimensional categorical data , 2004 .

[111]  Clark F. Olson,et al.  Parallel Algorithms for Hierarchical Clustering , 1995, Parallel Comput..

[112]  Ramin Zabih,et al.  Histogram refinement for content-based image retrieval , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[113]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[114]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[115]  Madjid Khalilian,et al.  A Novel K-Means Based Clustering Algorithm for High Dimensional Data Sets , 2010 .

[116]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[117]  Ethem Alpaydin,et al.  Constructive Feedforward ART Clustering Networks — Part I , 2001 .

[118]  Xu Jinhua,et al.  The Related Techniques of Content-Based Image Retrieval , 2008, 2008 International Symposium on Computer Science and Computational Technology.

[119]  Ethem Alpaydin,et al.  Constructive feedforward ART clustering networks. I , 2002, IEEE Trans. Neural Networks.

[120]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[121]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[122]  Panos Kalnis,et al.  Quality and efficiency in high dimensional nearest neighbor search , 2009, SIGMOD Conference.

[123]  Roded Sharan,et al.  Cluster graph modification problems , 2002, Discret. Appl. Math..

[124]  Jieping Ye,et al.  Discriminative K-means for Clustering , 2007, NIPS.

[125]  Yogita Mistry,et al.  Survey on Content Based Image RetrievalSystems , 2013 .

[126]  Christian Hartvedt,et al.  Using Context to Understand User Intentions in Image Retrieval , 2010, 2010 Second International Conferences on Advances in Multimedia.

[127]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[128]  Vittorio Castelli,et al.  Image Databases: Search and Retrieval of Digital Imagery , 2002 .

[129]  A. Raftery,et al.  Variable Selection for Model-Based Clustering , 2006 .