论文信息 - Do they belong to the same class: active learning by querying pairwise label homogeneity

Do they belong to the same class: active learning by querying pairwise label homogeneity

Traditional active learning methods request experts to provide ground truths to the queried instances, which can be expensive in practice. An alternative solution is to ask nonexpert labelers to do such labeling work, which can not tell the definite class labels. In this paper, we propose a new active learning paradigm, in which a nonexpert labeler is only asked "whether a pair of instances belong to the same class". To instantiate the proposed paradigm, we adopt the MinCut algorithm as the base classifier. We first construct a graph based on the pairwise distance of all the labeled and unlabeled instances and then repeatedly update the unlabeled edge weights on the max-flow paths in the graph. Finally, we select an unlabeled subset of nodes with the highest prediction confidence as the labeled data, which are included into the labeled data set to learn a new classifier for the next round of active learning. The experimental results and comparisons, with state-of-the-art methods, demonstrate that our active learning paradigm can result in good performance with nonexpert labelers.

Chengqi Zhang | Bin Li | Xingquan Zhu | Yifan Fu

[1] Mark Craven,et al. An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[2] Panagiotis G. Ipeirotis,et al. Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[3] Burr Settles,et al. Active Learning Literature Survey , 2009 .

[4] Jun Du,et al. Asking Generalized Queries to Domain Experts to Improve Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[5] Gerardo Hermosillo,et al. Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[6] David A. Cohn,et al. Active Learning with Statistical Models , 1996, NIPS.

[7] Gideon S. Mann,et al. Learning from labeled features using generalized expectation criteria , 2008, SIGIR '08.

[8] John D. Lafferty,et al. Semi-supervised learning using randomized mincuts , 2004, ICML.