Biased clustering method for partially supervised classification

Clustering algorithms are often used as unsupervised classifiers when minimal information about the classification problem is available. Clustering will usually assign an unique label, corresponding to a class, to each of the data points. For most implementations of clustering algorithms, those labels just correspond to a class index, but don't convey information about which class is that. Identification of the classes corresponding to the formed clusters can be done with heuristics or using information from points in the clusters with known classes. This paper describe a hybrid clustering approach based on a biased fuzzy C-means algorithm. Biases values corresponding to the expectancy of a data point be assigned to a class will be derived from simple image processing operations and included as weighting factors in the clustering algorithm. The final labels for the data will retain the order imposed by the biases, therefore can be used to identify the classes for the clusters. The basic fuzzy C- means algorithm and the modifications for use of biases will be presented. Results for both synthetic and imagery data classification with the method will be presented and compared with the non-biased clustering results. The results obtained with the biased method are qualitatively superior to the non- biased method when conservative biases are used for the classes, and the method can be applied when it is difficult or impractical to use a completely supervised method.