An Improvement of Stability Based Method to Clustering

In recent years, the concept of clustering stability is widely used to determining the number of clusters in a given dataset. This paper proposes an improvement of stability methods based on bootstrap technique. This amelioration is achieved by combining the instability property with an evaluation criterion and using a DCA (Difference Convex Algorithm) based clustering algorithm. DCA is an innovative approach in nonconvex programming, which has been successfully applied to many (smooth or nonsmooth) large-scale nonconvex programs in various domains. Experimental results on both synthetic and real datasets are promising and demonstrate the effectiveness of our approach.

[1]  Ulrike von Luxburg,et al.  Clustering Stability: An Overview , 2010, Found. Trends Mach. Learn..

[2]  Le Thi Hoai An,et al.  The DC (Difference of Convex Functions) Programming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems , 2005, Ann. Oper. Res..

[3]  Hoai An Le Thi,et al.  An Efficient Clustering Method for Massive Dataset Based on DC Programming and DCA Approach , 2013, ICONIP 2013.

[4]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[5]  Le Thi Hoai An,et al.  Binary classification via spherical separator by DC programming and DCA , 2012, Journal of Global Optimization.

[6]  Le Thi Hoai An,et al.  Recent Advances in DC Programming and DCA , 2013, Trans. Comput. Collect. Intell..

[7]  Minh Thuy Ta,et al.  Non convex optimization techniques based on DC programming and DCA and evolution methods for clustering. (Techniques d'optimisation non convexe basée sur la programmation DC et DCA et méthodes évolutives pour la classification non supervisée) , 2014 .

[8]  P. Kudova Clustering Genetic Algorithm , 2007 .

[9]  Lydia Boudjeloud,et al.  Clustering Data Streams over Sliding Windows by DCA , 2013, Advanced Computational Methods for Knowledge Engineering.

[10]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[11]  Lydia Boudjeloud,et al.  Clustering Data Stream by a Sub-window Approach Using DCA , 2012, MLDM.

[12]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[13]  Le Thi Hoai An,et al.  A new efficient algorithm based on DC programming and DCA for clustering , 2007, J. Glob. Optim..

[14]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[15]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[16]  Le Thi Hoai An,et al.  Optimization based DC programming and DCA for hierarchical clustering , 2007, Eur. J. Oper. Res..

[17]  Wei-Chen Chen,et al.  MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms , 2012 .

[18]  Mihaela Brut Enhancing the Knowledge Management Support within the E-Learning Platforms , 2007 .

[19]  Junhui Wang Consistent selection of the number of clusters via crossvalidation , 2010 .

[20]  Catherine A. Sugar,et al.  Finding the Number of Clusters in a Dataset , 2003 .

[21]  Boris G. Mirkin,et al.  Experiments for the Number of Clusters in K-Means , 2007, EPIA Workshops.

[22]  Yi Lu,et al.  Incremental genetic K-means algorithm and its application in gene expression data analysis , 2004, BMC Bioinformatics.

[23]  Le Hoai Minh,et al.  DC Programming and DCA for Solving Minimum Sum-of-Squares Clustering Using Weighted Dissimilarity Measures , 2013, Trans. Comput. Collect. Intell..

[24]  Le Thi Hoai An,et al.  Fuzzy clustering based on nonconvex optimisation approaches using difference of convex (DC) functions algorithms , 2007, Adv. Data Anal. Classif..

[25]  Boris G. Mirkin,et al.  Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads , 2010, J. Classif..

[26]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[27]  Le Thi Hoai An,et al.  A DC programming approach for feature selection in support vector machines learning , 2008, Adv. Data Anal. Classif..

[28]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[29]  Le Thi Hoai An,et al.  Block Clustering Based on Difference of Convex Functions (DC) Programming and DC Algorithms , 2013, Neural Computation.

[30]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[31]  Le Thi Hoai An,et al.  Feature selection for linear SVMs under uncertain data: Robust optimization based on difference of convex functions algorithms , 2014, Neural Networks.

[32]  Junhui Wang,et al.  Selection of the number of clusters via the bootstrap method , 2012, Comput. Stat. Data Anal..