Partially supervised classification using weighted unsupervised clustering

This paper addresses a classification problem in which class definition through training samples or otherwise is provided a priori only for a particular class of interest. Considerable time and effort may be required to label samples necessary for defining all the classes existent in a given data set by collecting ground truth or by other means. Thus, this problem is very important in practice, because one is often interested in identifying samples belonging to only one or a small number of classes. The problem is considered as an unsupervised clustering problem with initially one known cluster. The definition and statistics of the other classes are automatically developed through a weighted unsupervised clustering procedure that keeps the known cluster from losing its identity as the "class of interest". Once all the classes are developed, a conventional supervised classifier such as the maximum likelihood classifier is used in the classification. Experimental results with both simulated and real data verify the effectiveness of the proposed method.