Near-Optimally Teaching the Crowd to Classify

How should we present training examples to learners to teach them classification rules? This is a natural problem when training workers for crowdsourcing labeling tasks, and is also motivated by challenges in data-driven online education. We propose a natural stochastic model of the learners, modeling them as randomly switching among hypotheses based on observed feedback. We then develop STRICT, an efficient algorithm for selecting examples to teach to workers. Our solution greedily maximizes a submodular surrogate objective function in order to select examples to showto the learners. We prove that our strategy is competitive with the optimal teaching policy. Moreover, for the special case of linear separators, we prove that an exponential reduction in error probability can be achieved. Our experiments on simulated workers as well as three real image annotation tasks on Amazon Mechanical Turk show the effectiveness of our teaching algorithm.

[1]  Hans Ulrich Simon,et al.  Recursive Teaching Dimension, Learning Complexity, and Maximum Classes , 2010, ALT.

[2]  Elizabeth Gerber,et al.  A pilot study of using crowds in the classroom , 2013, CHI.

[3]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[4]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[5]  William L. Steiger,et al.  Algorithms for ham-sandwich cuts , 1994, CCCG.

[6]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[7]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[8]  Richard M. Karp,et al.  Noisy binary search and its applications , 2007, SODA '07.

[9]  Sumit Basu,et al.  Teaching Classification Boundaries to Humans , 2013, AAAI.

[10]  Harold Pashler,et al.  Optimizing Instructional Policies , 2013, NIPS.

[11]  Xiaojin Zhu,et al.  Machine Teaching for Bayesian Learners in the Exponential Family , 2013, NIPS.

[12]  Anirban Dasgupta,et al.  Aggregating crowdsourced binary ratings , 2013, WWW.

[13]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[14]  Jun Du,et al.  Active Teaching for Inductive Learners , 2011, SDM.

[15]  Robert D. Nowak,et al.  The Geometry of Generalized Binary Search , 2009, IEEE Transactions on Information Theory.

[16]  Pietro Perona,et al.  Crowdclustering , 2011, NIPS.

[17]  Thomas Zeugmann,et al.  Recent Developments in Algorithmic Teaching , 2009, LATA.

[18]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[19]  Lydia B. Chilton,et al.  Personalized Online Education - A Crowdsourcing Challenge , 2012, HCOMP@AAAI.

[20]  Brian L. Sullivan,et al.  eBird: A citizen-based bird observation network in the biological sciences , 2009 .

[21]  Sandra Zilles,et al.  Models of Cooperative Teaching and Learning , 2011, J. Mach. Learn. Res..

[22]  C. Lintott,et al.  Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey , 2008, 0804.4483.

[23]  John Shawe-Taylor,et al.  On exact specification by examples , 1992, COLT '92.