Clusterer ensemble

Ensemble methods that train multiple learners and then combine their predictions have been shown to be very effective in supervised learning. This paper explores ensemble methods for unsupervised learning. Here, an ensemble comprises multiple clusterers, each of which is trained by k-means algorithm with different initial points. The clusters discovered by different clusterers are aligned, i.e. similar clusters are assigned with the same label, by counting their overlapped data items. Then, four methods are developed to combine the aligned clusterers. Experiments show that clustering performance could be significantly improved by ensemble methods, where utilizing mutual information to select a subset of clusterers for weighted voting is a nice choice. Since the proposed methods work by analyzing the clustering results instead of the internal mechanisms of the component clusterers, they are applicable to diverse kinds of clustering algorithms.

[1]  Vladimir Estivill-Castro,et al.  Why so many clustering algorithms: a position paper , 2002, SKDD.

[2]  Kevin J. Cherkauer Human Expert-level Performance on a Scientiic Image Analysis Task by a System Using Combined Artiicial Neural Networks , 1996 .

[3]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  Yu-Bin Yang,et al.  Lung cancer cell identification based on artificial neural network ensembles , 2002, Artif. Intell. Medicine.

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  Harris Drucker,et al.  Improving Performance in Neural Networks Using a Boosting Algorithm , 1992, NIPS.

[8]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[13]  Tsuhan Chen,et al.  Pose invariant face recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[14]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.