论文信息 - Semi-Supervised Clustering Using Genetic Algorithms

Semi-Supervised Clustering Using Genetic Algorithms

A semi-supervised clustering algorithm is proposed that combines the benefits of supervised and unsupervised learning methods. The approach allows unlabeled data with no known class to be used to improve classification accuracy. The objective function of an unsupervised technique, e.g. K-means clustering, is modified to minimize both the cluster dispersion of the input attributes and a measure of cluster impurity based on the class labels. Minimizing the cluster dispersion of the examples is a form of capacity control to prevent overfitting. For the the output labels, impurity measures from decision tree algorithms such as the Gini index can be used. A genetic algorithm optimizes the objective function to produce clusters. Experimental results show that using class information improves the generalization ability compared to unsupervised methods based only on the input attributes.

Ayhan Demiriz | Kristin P. Bennett | A. Demiriz | Kristin P. Bennett

[1] C. A. Murthy,et al. In search of optimal clusters using genetic algorithms , 1996, Pattern Recognit. Lett..

[2] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[3] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[4] Donald W. Bouldin,et al. A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[6] James C. Bezdek,et al. Partially supervised clustering for image segmentation , 1996, Pattern Recognit..

[7] Vladimir Cherkassky,et al. Constrained Topological Maps for Regression and Classification , 1997, ICONIP.

[8] Rita Cucchiara,et al. Genetic algorithms for clustering in machine vision , 1998, Machine Vision and Applications.

[9] Michael E. Wall,et al. Galib: a c++ library of genetic algorithm components , 1996 .

[10] Thorsten Joachims,et al. Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[11] Manish Sarkar,et al. A clustering algorithm using an evolutionary programming-based approach , 1997, Pattern Recognit. Lett..

[12] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .

[13] Ayhan Demiriz,et al. Semi-Supervised Support Vector Machines , 1998, NIPS.