Near-Optimal Bayesian Active Learning with Noisy Observations

We tackle the fundamental problem of Bayesian active learning with noise, where we need to adaptively select from a number of expensive tests in order to identify an unknown hypothesis sampled from a known prior distribution. In the case of noise-free observations, a greedy algorithm called generalized binary search (GBS) is known to perform near-optimally. We show that if the observations are noisy, perhaps surprisingly, GBS can perform very poorly. We develop EC2, a novel, greedy active learning algorithm and prove that it is competitive with the optimal policy, thus obtaining the first competitiveness guarantees for Bayesian active learning with noisy observations. Our bounds rely on a recently discovered diminishing returns property called adaptive submodularity, generalizing the classical notion of submodular set functions to adaptive policies. Our results hold even if the tests have non-uniform cost and their noise is correlated. We also propose EFFECX-TIVE, a particularly fast approximation of EC2, and evaluate it on a Bayesian experimental design problem involving human subjects, intended to tease apart competing economic theories of how people make decisions under uncertainty.

[1]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[2]  A. Tversky,et al.  Prospect theory: an analysis of decision under risk — Source link , 2007 .

[3]  J. Neumann,et al.  Theory of Games and Economic Behavior. , 1945 .

[4]  J. Neumann,et al.  The Theory of Games and Economic Behaviour , 1944 .

[5]  Haim Levy,et al.  Efficient Portfolio Selection with Quadratic and Cubic Utility , 1970 .

[6]  J. Pratt RISK AVERSION IN THE SMALL AND IN THE LARGE11This research was supported by the National Science Foundation (grant NSF-G24035). Reproduction in whole or in part is permitted for any purpose of the United States Government. , 1964 .

[7]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[8]  Teresa M. Przytycka,et al.  On an Optimal Split Tree Problem , 1999, WADS.

[9]  Jeff A. Bilmes,et al.  Average-Case Active Learning with Costs , 2009, ALT.

[10]  Suresh K. Bhavnani,et al.  Extensions of Generalized Binary Search to Group Identification and Exponential Costs , 2010, NIPS.

[11]  Andreas Krause,et al.  Optimal Value of Information in Graphical Models , 2009, J. Artif. Intell. Res..

[12]  Suresh K. Bhavnani,et al.  Group-based Query Learning for rapid diagnosis in time-critical situations , 2009, ArXiv.

[13]  Alina Beygelzimer,et al.  Efficient Test Selection in Active Diagnosis via Entropy Approximation , 2005, UAI.

[14]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[15]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[16]  Mukesh K. Mohania,et al.  Decision trees for entity identification: approximation algorithms and hardness results , 2007, PODS '07.

[17]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[18]  Robert D. Nowak,et al.  Noisy Generalized Binary Search , 2009, NIPS.

[19]  Andreas Krause,et al.  Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[20]  A. Copeland Review: John von Neumann and Oskar Morgenstern, Theory of games and economic behavior , 1945 .

[21]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[22]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[23]  Colin Camerer An experimental test of several generalized utility theories , 1989 .