A clustering method based on boosting

It is widely recognized that the boosting methodology provides superior results for classification problems. In this paper, we propose the boost-clustering algorithm which constitutes a novel clustering methodology that exploits the general principles of boosting in order to provide a consistent partitioning of a dataset. The boost-clustering algorithm is a multi-clustering method. At each boosting iteration, a new training set is created using weighted random sampling from the original dataset and a simple clustering algorithm (e.g.k-means) is applied to provide a new data partitioning. The final clustering solution is produced by aggregating the multiple clustering results through weighted voting. Experiments on both artificial and real-world data sets indicate that boost-clustering provides solutions of improved quality.

[1]  Nikos A. Vlassis,et al.  A Greedy EM Algorithm for Gaussian Mixture Learning , 2002, Neural Processing Letters.

[2]  Ana L. N. Fred,et al.  Finding Consistent Clusters in Data Partitions , 2001, Multiple Classifier Systems.

[3]  Bernard F. Buxton,et al.  Performance Degradation in Boosting , 2001, Multiple Classifier Systems.

[4]  Michalis Vazirgiannis,et al.  Clustering algorithms and validity measures , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[5]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Georges Hébrail,et al.  Interactive Interpretation of Hierarchical Clustering , 1998, Intell. Data Anal..

[7]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[8]  Andreas Stafylopatis,et al.  A Multi-clustering Fusion Algorithm , 2002, SETN.

[9]  Michalis Vazirgiannis,et al.  Quality Scheme Assessment in the Clustering Process , 2000, PKDD.

[10]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[11]  Sankar K. Pal,et al.  Fuzzy models for pattern recognition : methods that search for structures in data , 1992 .

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Douglas H. Fisher,et al.  Knowledge acquisition via incremental conceptual clustering , 2004, Machine Learning.

[14]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[15]  Padhraic Smyth,et al.  Clustering Using Monte Carlo Cross-Validation , 1996, KDD.

[16]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[17]  Eric J. Pauwels,et al.  Finding Salient Regions in Images: Nonparametric Clustering for Image Segmentation and Grouping , 1999, Comput. Vis. Image Underst..

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .