Cost-Effective Active Learning from Diverse Labelers

In traditional active learning, there is only one labeler that always returns the ground truth of queried labels. However, in many applications, multiple labelers are available to offer diverse qualities of labeling with different costs. In this paper, we perform active selection on both instances and labelers, aiming to improve the classification model most with the lowest cost. While the cost of a labeler is proportional to its overall labeling quality, we also observe that different labelers usually have diverse expertise, and thus it is likely that labelers with a low overall quality can provide accurate labels on some specific instances. Based on this fact, we propose a novel active selection criterion to evaluate the cost-effectiveness of instance-labeler pairs, which ensures that the selected instance is helpful for improving the classification model, and meanwhile the selected labeler can provide an accurate label for the instance with a relative low cost. Experiments on both UCI and real crowdsourcing data sets demonstrate the superiority of our proposed approach on selecting cost-effective queries.

[1]  Kamalika Chaudhuri,et al.  Active Learning from Weak and Strong Labelers , 2015, NIPS.

[2]  Jaime G. Carbonell,et al.  A Probabilistic Framework to Learn from Multiple Annotators with Time-Varying Accuracy , 2010, SDM.

[3]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[4]  Jieping Ye,et al.  Querying discriminative and representative samples for batch mode active learning , 2013, KDD.

[5]  Dacheng Tao,et al.  Active Learning for Crowdsourcing Using Knowledge Transfer , 2014, AAAI.

[6]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[7]  Jennifer G. Dy,et al.  Active Learning from Crowds , 2011, ICML.

[8]  Jaime G. Carbonell,et al.  Proactive learning: cost-sensitive active learning with multiple imperfect oracles , 2008, CIKM '08.

[9]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[10]  Jaime G. Carbonell,et al.  Active Learning and Crowd-Sourcing for Machine Translation , 2010, LREC.

[11]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[12]  Jennifer G. Dy,et al.  Active Learning from Multiple Knowledge Sources , 2012, AISTATS.

[13]  Panagiotis G. Ipeirotis,et al.  Repeated labeling using multiple noisy labelers , 2012, Data Mining and Knowledge Discovery.

[14]  Lukasz Kurgan,et al.  Data Mining and Knowledge Discovery Data Mining and Knowledge Discovery , 2002 .

[15]  Jingbo Zhu,et al.  Active Learning With Sampling by Uncertainty and Density for Data Annotations , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[17]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[18]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[19]  Jaime G. Carbonell,et al.  Efficiently learning the accuracy of labeling sources for selective sampling , 2009, KDD.

[20]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[21]  Fabio Roli,et al.  Dynamic Classifier Selection , 2000, Multiple Classifier Systems.

[22]  Subramanian Ramanathan,et al.  Learning from multiple annotators with varying expertise , 2013, Machine Learning.

[23]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[24]  Xindong Wu,et al.  Active Learning With Imbalanced Multiple Noisy Labeling , 2015, IEEE Transactions on Cybernetics.

[25]  Mausam,et al.  To Re(label), or Not To Re(label) , 2014, HCOMP.

[26]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Gita Reese Sukthankar,et al.  Incremental Relabeling for Active Learning with Noisy Crowdsourced Annotations , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[28]  Claudio Gentile,et al.  Selective sampling and active learning from single and multiple teachers , 2012, J. Mach. Learn. Res..

[29]  Sethuraman Panchanathan,et al.  Batch mode active sampling based on marginal probability distribution matching , 2012, TKDD.

[30]  Kun Deng,et al.  Active Learning from Multiple Noisy Labelers with Varied Costs , 2010, 2010 IEEE International Conference on Data Mining.

[31]  Thomas L. Griffiths,et al.  Advances in Neural Information Processing Systems 21 , 1993, NIPS 2009.

[32]  Dale Schuurmans,et al.  Discriminative Batch Mode Active Learning , 2007, NIPS.

[33]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..