Optimal Stopping and Worker Selection in Crowdsourcing: an Adaptive Sequential Probability Ratio Test Framework

In this paper, we aim at solving a class of multiple testing problems under the Bayesian sequential decision framework. Our motivating application comes from binary labeling tasks in crowdsourcing, where the requestor needs to simultaneously decide which worker to choose to provide the label and when to stop collecting labels under a certain budget constraint. We start with the binary hypothesis testing problem to determine the true label of a single object, and provide an optimal solution by casting it under the adaptive sequential probability ratio test (Ada-SPRT) framework. We characterize the structure of the optimal solution, i.e., optimal adaptive sequential design, which minimizes the Bayes risk through log-likelihood ratio statistic. We also develop a dynamic programming algorithm that can efficiently approximate the optimal solution. For the multiple testing problem, we further propose to adopt an empirical Bayes approach for estimating class priors and show that our method has an averaged loss that converges to the minimal Bayes risk under the true model. The experiments on both simulated and real data show the robustness of our method and its superiority in labeling accuracy as compared to several other recently proposed approaches.

[1]  A. Wald Sequential Tests of Statistical Hypotheses , 1945 .

[2]  J. Wolfowitz,et al.  Optimum Character of the Sequential Probability Ratio Test , 1948 .

[3]  A. Wald,et al.  A Sequential Decision Procedure for Choosing One of Three Hypotheses Concerning the Unknown Mean of a Normal Distribution , 1949 .

[4]  M. A. Girshick,et al.  Bayes and minimax solutions of sequential decision problems , 1949 .

[5]  J Wolfowitz,et al.  Bayes Solutions of Sequential Decision Problems. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[6]  David Middleton,et al.  Optimum sequential detection of signals in noise , 1955, IRE Trans. Inf. Theory.

[7]  R. Bellman A Markovian Decision Process , 1957 .

[8]  H. Chernoff Sequential Design of Experiments , 1959 .

[9]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[10]  A. Albert The Sequential Design of Experiments for Infinitely Many States of Nature , 1961 .

[11]  Walter T. Federer,et al.  Sequential Design of Experiments , 1967 .

[12]  R. Khan,et al.  Sequential Tests of Statistical Hypotheses. , 1972 .

[13]  H. Robbins,et al.  Sequential Tests Involving Two Populations , 1974 .

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[16]  N. Schmitz,et al.  On the optimality of the sprt for processes with continuous time parameter , 1984 .

[17]  I. I. Tsitovich Sequential Design of Experiments for Hypothesis Testing , 1985 .

[18]  T. Lai Adaptive treatment allocation and the multi-armed bandit problem , 1987 .

[19]  R. Karunamuni On Empirical Bayes Testing with Sequential Components , 1988 .

[20]  Kathleen M. Sheehan,et al.  Using Bayesian Decision Theory to Design a Computerized Mastery Test , 1990 .

[21]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[22]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[23]  S. Embretson,et al.  Item response theory for psychologists , 2000 .

[24]  T. Lai SEQUENTIAL ANALYSIS: SOME CLASSICAL PROBLEMS AND NEW CHALLENGES , 2001 .

[25]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[26]  Cun-Hui Zhang,et al.  Compound decision theory and empirical bayes methods , 2003 .

[27]  Tze Leung Lai,et al.  Power, sample size and adaptation considerations in the design of group sequential clinical trials , 2003 .

[28]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[29]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[30]  張源俊 Application of Sequential Probability Ratio Test to Computerized Criterion-Referenced Testing , 2004 .

[31]  Y. Chang,et al.  Application of Sequential Interval Estimation to Adaptive Mastery Testing , 2005 .

[32]  Jay Bartroff,et al.  Modern Sequential Analysis and Its Applications to Computerized Adaptive Testing , 2011, 1106.2559.

[33]  B. Efron Size, power and false discovery rates , 2007, 0710.2245.

[34]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[35]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[36]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[37]  Filip Radlinski,et al.  Mortal Multi-Armed Bandits , 2008, NIPS.

[38]  Jay Bartroff,et al.  Efficient adaptive designs with mid‐course sample size adjustment in clinical trials , 2008, Statistics in medicine.

[39]  Lawrence D. Brown,et al.  NONPARAMETRIC EMPIRICAL BAYES AND COMPOUND DECISION APPROACHES TO ESTIMATION OF A HIGH-DIMENSIONAL VECTOR OF NORMAL MEANS , 2009, 0908.1712.

[40]  William H Press,et al.  Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research , 2009, Proceedings of the National Academy of Sciences.

[41]  Wenhua Jiang,et al.  General maximum likelihood empirical Bayes estimation of normal means , 2009, 0908.1709.

[42]  C. Lintott,et al.  Galaxy Zoo: Exploring the Motivations of Citizen Science Volunteers. , 2009, 0909.2925.

[43]  Bradley Efron,et al.  Large-scale inference , 2010 .

[44]  Wenhua Jiang,et al.  Empirical Bayes in-season prediction of baseball batting averages , 2010 .

[45]  Tara Javidi,et al.  Active M-ary sequential hypothesis testing , 2010, 2010 IEEE International Symposium on Information Theory.

[46]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[47]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[48]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[49]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[50]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.

[51]  Miguel Angel Luengo-Oroz,et al.  Crowdsourcing Malaria Parasite Quantification: An Online Game for Analyzing Images of Infected Thick Blood Smears , 2012, Journal of medical Internet research.

[52]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[53]  Tara Javidi,et al.  Active Sequential Hypothesis Testing , 2012, ArXiv.

[54]  Steve Feng,et al.  Distributed Medical Image Analysis and Diagnosis through Crowd-Sourced Games: A Malaria Case Study , 2012, PloS one.

[55]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[56]  Jay Bartroff,et al.  Sequential Experimentation in Clinical Trials , 2013 .

[57]  Chao Gao,et al.  Minimax Optimal Convergence Rates for Estimating Ground Truth from Crowdsourced Labels , 2013, 1310.5764.

[58]  Sirin Nitinawarat,et al.  Controlled Sensing for Sequential Multihypothesis Testing with Controlled Markovian Observations and Non-Uniform Control Cost , 2013 .

[59]  Tara Javidi,et al.  Sequentiality and Adaptivity Gains in Active Hypothesis Testing , 2012, IEEE Journal of Selected Topics in Signal Processing.

[60]  Moshe Babaioff,et al.  Characterizing truthful multi-armed bandit mechanisms: extended abstract , 2008, EC '09.

[61]  Aleksandrs Slivkins,et al.  Online decision making in crowdsourcing markets: theoretical challenges , 2013, SECO.

[62]  R. Koenker,et al.  An Empirical Bayes Homework Problem , 2011 .

[63]  M. Basseville,et al.  Sequential Analysis: Hypothesis Testing and Changepoint Detection , 2014 .

[64]  Cynthia Rudin,et al.  Approximating the crowd , 2014, Data Mining and Knowledge Discovery.

[65]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[66]  Devavrat Shah,et al.  Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[67]  Xi Chen,et al.  Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling , 2014, J. Mach. Learn. Res..

[68]  ChenXi,et al.  Statistical decision making for optimal budget allocation in crowd labeling , 2015 .

[69]  Aditya G. Parameswaran,et al.  Crowdsourced Data Management: Industry and Academic Perspectives , 2015, Found. Trends Databases.

[70]  Ashish Khetan,et al.  Achieving budget-optimality with adaptive schemes in crowdsourcing , 2016, NIPS.

[71]  Jinwoo Shin,et al.  Optimality of Belief Propagation for Crowdsourced Classification , 2016, ICML.

[72]  Vijay Gupta,et al.  An On-line Sensor Selection Algorithm for SPRT With Multiple Sensors , 2014, IEEE Transactions on Automatic Control.

[73]  Vijaya Krishna Yalavarthi,et al.  A Demonstration of PERC: Probabilistic Entity Resolution With Crowd Errors , 2018, Proc. VLDB Endow..

[74]  Georgios Fellouris,et al.  Sequential multiple testing with generalized error control: An asymptotic optimality theory , 2016, The Annals of Statistics.