Binary data clustering based on Wiener transformation

Clustering is the process of grouping similar items. Clustering becomes very tedious when data dimensionality and sparsity increases. Binary data are the simplest form of data used in information systems for very large database and it is very efficient based on computational efficiency, memory capacity to represent categorical type data. Usually the binary data clustering is done by using 0 and 1 as numerical value. In this paper, the binary data clustering is performed by preprocessing the binary data to real by wiener transformation. Wiener is a linear Transformation based upon statistics and it is optimal in terms of Mean square error. Computational results show that the clustering based on Wiener transformation is very efficient in terms of objectivity and subjectivity.

[1]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[2]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[3]  Tao Li,et al.  A general model for clustering binary data , 2005, KDD '05.

[4]  Carlos Ordonez,et al.  Clustering binary data streams with K-means , 2003, DMKD '03.

[5]  Subanar,et al.  Clustering Binary Data Based on Rough Set Indiscernibility Level( SOFT COMPUTING METHODOLOGIES AND ITS APPLICATIONS) , 2011 .

[6]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[7]  Ting Su,et al.  A deterministic method for initializing K-means clustering , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[8]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[9]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[10]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[11]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[12]  C Velayutham,et al.  Entropy based unsupervised Feature Selection in digital mammogram image using rough set theory , 2012, Int. J. Comput. Biol. Drug Des..

[13]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[14]  Ira Assent,et al.  CLICKS: an effective algorithm for mining subspace clusters in categorical datasets , 2005, KDD '05.