Learning and Feature Selection under Budget Constraints in Crowdsourcing

The cost of data acquisition limits the amount of labeled data available for machine learning algorithms, both at the training and the testing phase. This problem is further exacerbated in real-world crowdsourcing applications where labels are aggregated from multiple noisy answers. We tackle classification problems where the underlying feature labels are unknown to the algorithm and a (noisy) label of the desired feature can be acquired at a fixed cost. This problem has two types of budget constraints - the total cost of feature labels available for learning at the training phase, and the cost of features to use during the testing phase for classification. We propose a novel budgeted learning and feature selection algorithm, B-LEAFS, for jointly tackling this problem in the presence of noise. Experimental evaluation on synthetic and real-world crowdsourcing data demonstrate the practical applicability of our approach.

[1]  Shie Mannor,et al.  Sensor Selection for Crowdsensing Dynamical Systems , 2015, AISTATS.

[2]  Kun Deng,et al.  Active Learning from Multiple Noisy Labelers with Varied Costs , 2010, 2010 IEEE International Conference on Data Mining.

[3]  Xi Chen,et al.  Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing , 2014, ICML.

[4]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Russell Greiner,et al.  Learning and Classifying Under Hard Budgets , 2005, ECML.

[6]  Debmalya Mandal,et al.  A Truthful Budget Feasible Multi-Armed Bandit Mechanism for Crowdsourcing Time Critical Tasks , 2015, AAMAS.

[7]  Andreas Krause,et al.  Crowd Access Path Optimization: Diversity Matters , 2015, HCOMP.

[8]  Wei Chen,et al.  Stochastic Online Greedy Learning with Semi-bandit Feedbacks , 2015, NIPS.

[9]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[10]  Russell Greiner,et al.  Budgeted Learning of Naive-Bayes Classifiers , 2003, UAI.

[11]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[12]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[13]  Carlos Guestrin,et al.  A Note on the Budgeted Maximization of Submodular Functions , 2005 .

[14]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[15]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[16]  Nicholas R. Jennings,et al.  Efficient budget allocation with accuracy guarantees for crowdsourcing classification tasks , 2013, AAMAS.

[17]  Kun Deng,et al.  New algorithms for budgeted learning , 2012, Machine Learning.

[18]  Milad Shokouhi,et al.  Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[19]  Paolo Avesani,et al.  Active sampling for detecting irrelevant features , 2006, ICML.

[20]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[21]  Tao Qin,et al.  Multi-Armed Bandit with Budget Constraint and Variable Costs , 2013, AAAI.

[22]  Eric Horvitz,et al.  Identifying and Accounting for Task-Dependent Bias in Crowdsourcing , 2015, HCOMP.

[23]  Devavrat Shah,et al.  Budget-optimal crowdsourcing using low-rank matrix approximations , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[24]  Andreas Krause,et al.  Noisy Submodular Maximization via Adaptive Sampling with Applications to Crowdsourced Image Collection Summarization , 2015, AAAI.

[25]  Chien-Ju Ho,et al.  Adaptive Task Assignment for Crowdsourced Classification , 2013, ICML.

[26]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[27]  Purnamrita Sarkar,et al.  Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning , 2014, Proc. VLDB Endow..

[28]  Pedro M. Domingos,et al.  Naive Bayes models for probability estimation , 2005, ICML.

[29]  Nicholas R. Jennings,et al.  Efficient crowdsourcing of unknown experts using bounded multi-armed bandits , 2014, Artif. Intell..

[30]  Adam Tauman Kalai,et al.  Crowdsourcing Feature Discovery via Adaptively Chosen Comparisons , 2015, HCOMP.

[31]  Maxim Sviridenko,et al.  A note on maximizing a submodular set function subject to a knapsack constraint , 2004, Oper. Res. Lett..

[32]  Kun Deng,et al.  Bandit-Based Algorithms for Budgeted Learning , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[33]  Mausam,et al.  To Re(label), or Not To Re(label) , 2014, HCOMP.

[34]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[35]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[36]  Yixin Chen,et al.  Feature-Cost Sensitive Learning with Submodular Trees of Classifiers , 2014, AAAI.

[37]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[38]  Mausam,et al.  Re-Active Learning: Active Learning with Relabeling , 2016, AAAI.

[39]  Ittai Abraham,et al.  Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem , 2013, COLT.

[40]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.