Selective sampling based on the variation in label assignments

In this paper, a new selective sampling method for the active learning framework is presented. Initially, a small training set T and a large unlabeled set /spl Omega/ are given. The goal is to select, one by one, the most informative objects from /spl Omega/ such that, after labeling by an expert, they guarantee the best improvement in the classifier performance. Our sampling strategy relies on measuring the variation in label assignments (of the unlabeled set) between the classifier trained on T and the classifiers trained on T with a single unlabeled object added with all possible labels. We compare the performance of our algorithm with two traditional procedures random sampling and uncertainty sampling. We show empirically across a range of datasets that the proposed selective sampling method decreases the number of labeled instances needed to achieve the desired error for the fixed size of T. Experimental results on toy problems and the UCI datasets are presented.