An Internal Clustering Validation Index for Boolean Data

Abstract Internal clustering validation is recognized as one of the vital issues essential to clustering applications, especially when external information is not available. Existing measures have their limitations in different application circumstances. There are still some deficiencies for Internal Validation of Boolean clustering. This paper proposes a new Clustering Validation index based on Type of Attributes for Boolean data (CVTAB). It evaluates the clustering quality in the light of Dissimilarity of two clusters for Boolean Data (DBD). The attributes in the Boolean Data are categorized into three types: Type A, Type O and Type E representing respectively the attribute values 1,0 and not the same for all the objects in the set. When two clusters are composed into one, DBD applies the numbers of attributes with the types changed and the numbers of objects changed to measure dissimilarity of two clusters. CVTAB evaluates the clustering quality without respect to external information

[1]  Thomas Seidl,et al.  An effective evaluation measure for clustering on evolving data streams , 2011, KDD.

[2]  Geeta Sikka,et al.  Recent Techniques of Clustering of Time Series Data: A Survey , 2012 .

[3]  Yanchi Liu,et al.  Imputing Missing Values for Mixed Numeric and Categorical Attributes Based on Incomplete Data Hierarchical Clustering , 2011, KSEM.

[4]  Robert H. Sturges,et al.  Optimization of a truck-drone in tandem delivery network using K-means and genetic algorithm , 2016 .

[5]  Max A. Viergever,et al.  Normalized mutual information based registration using k-means clustering and shading correction , 2006, Medical Image Anal..

[6]  Vit Niennattrakul,et al.  On Clustering Multimedia Time Series Data Using K-Means and Dynamic Time Warping , 2007, 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE'07).

[7]  Joshua Zhexue Huang,et al.  A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining , 1997, DMKD.

[8]  D. Ross Canada and the World at Risk: Depression, War, and Isolationism for the 21st Century? , 1997 .

[9]  Günther Palm,et al.  Multi-objective selection for collecting cluster alternatives , 2011, Comput. Stat..

[10]  Zhigang Luo,et al.  NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization , 2012, IEEE Transactions on Signal Processing.

[11]  Hui Xiong,et al.  Understanding and Enhancement of Internal Clustering Validation Measures , 2013, IEEE Transactions on Cybernetics.

[12]  SenWu,et al.  CABOSFV algorithm for high dimensional sparse data clustering , 2004 .

[13]  Zhang Ge,et al.  A Survey of Membrane Computing as a New Branch of Natural Computing , 2010 .

[14]  Chen Li A Hierarchical Method for Determining the Number of Clusters , 2008 .

[15]  H. Ralambondrainy,et al.  A conceptual version of the K-means algorithm , 1995, Pattern Recognit. Lett..

[16]  Nadia Busi,et al.  Using well-structured transition systems to decide divergence for catalytic P systems , 2007, Theor. Comput. Sci..

[17]  S. Samarasinghe,et al.  Complex time series analysis of PM10 and PM2.5 for a coastal site using artificial neural network modelling and k-means clustering , 2014 .