Controlled-sized clustering based on optimization

Clustering is one of unsupervised classification method, that is, it classifies a data set into some clusters without any external criterion. Typical clustering methods, e.g. k-means (KM) and fuzzy c-means (FCM) are constructed based on optimization of the given objective function. Many clustering methods as well as KM and FCM are formulated as optimization problems with typical objective functions and constraints. The objective function itself is also an evaluation guideline of results of clustering methods. Considering together with its theoretical extensibility, there is the great advantage to construct clustering methods in the framework of optimization. From the viewpoint of optimization, some of the authors proposed an even-sized clustering method based on optimization (ECBO), which is with strengthened constraints of cluster size, and constructed some variations of ECBO. The constraint considered in ECBO is that each cluster size is K or K + 1. ECBO is based on KM and its algorithm is constructed as iterative optimization. The belongingness of each object to clusters are calculated by the simplex method in each iteration. The numerical experiments show that ECBO has higher classification accuracy than other similar clustering methods. It is considered that ECBO has the advantage in the viewpoint of clustering accuracy, cluster size, and optimization framework than other similar methods. However, the constraint of cluster sizes of ECBO is strict so that it may be inconvenient in case that the partition results, of which each cluster size need not be strictly even, but uneven, is desirable. Moreover, it is expected that new clustering algorithms of which each cluster size can be controlled can deal with more various datasets. In this paper, we first propose two new clustering algorithms based on ECBO. Each cluster size can be controlled in the proposed algorithms. Next, we estimate the new clustering algorithms through some numerical experiments.

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  Yukihiro Hamasuna,et al.  On Various Types of Even-Sized Clustering Based on Optimization , 2016, MDAI.

[3]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4]  Yukihiro Hamasuna,et al.  On even-sized clustering algorithm based on optimization , 2014, 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS).

[5]  Sadaaki Miyamoto,et al.  Algorithms for Fuzzy Clustering - Methods in c-Means Clustering with Applications , 2008, Studies in Fuzziness and Soft Computing.

[6]  Xianmang He,et al.  Clustering-Based k-Anonymity , 2012, PAKDD.

[7]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[8]  Maria-Florina Balcan,et al.  Learning the best algorithm for max-cut, clustering, and other partitioning problems , 2016, ArXiv.

[9]  Jun-Lin Lin,et al.  An efficient clustering method for k-anonymization , 2008, PAIS '08.