Beat the Machine: Challenging Humans to Find a Predictive Model's “Unknown Unknowns”

We present techniques for gathering data that expose errors of automatic predictive models. In certain common settings, traditional methods for evaluating predictive models tend to miss rare but important errors—most importantly, cases for which the model is confident of its prediction (but wrong). In this article, we present a system that, in a game-like setting, asks humans to identify cases that will cause the predictive model-based system to fail. Such techniques are valuable in discovering problematic cases that may not reveal themselves during the normal operation of the system and may include cases that are rare but catastrophic. We describe the design of the system, including design iterations that did not quite work. In particular, the system incentivizes humans to provide examples that are difficult for the model to handle by providing a reward proportional to the magnitude of the predictive model's error. The humans are asked to “Beat the Machine” and find cases where the automatic model (“the Machine”) is wrong. Experiments show that the humans using Beat the Machine identify more errors than do traditional techniques for discovering errors in predictive models, and, indeed, they identify many more errors where the machine is (wrongly) confident it is correct. Furthermore, those cases the humans identify seem to be not simply outliers, but coherent areas missed completely by the model. Beat the Machine identifies the “unknown unknowns.” Beat the Machine has been deployed at an industrial scale by several companies. The main impact has been that firms are changing their perspective on and practice of evaluating predictive models. “There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know.” --Donald Rumsfeld

[1]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[2]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[3]  Gary M. Weiss The Impact of Small Disjuncts on Classifier Learning , 2010, Data Mining.

[4]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[5]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[6]  Foster J. Provost,et al.  Inactive learning?: difficulties employing active learning in practice , 2011, SKDD.

[7]  Raymond Reiter On Closed World Data Bases , 1977, Logic and Data Bases.

[8]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[9]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[10]  Foster J. Provost,et al.  Machine learning for targeted display advertising: transfer learning in action , 2013, Machine Learning.

[11]  Tom Fawcett,et al.  Data science for business , 2013 .

[12]  Panagiotis G. Ipeirotis,et al.  Beat the Machine: Challenging Workers to Find the Unknown Unknowns , 2011, Human Computation.

[13]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[14]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[15]  C. K. Chow,et al.  An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..

[16]  Kurt VanLehn,et al.  Analogy Events: How Examples are Used During Problem Solving , 1998, Cogn. Sci..

[17]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[18]  Foster J. Provost,et al.  Active Sampling for Class Probability Estimation and Ranking , 2004, Machine Learning.