Twenty Questions with Noise: Bayes Optimal Policies for Entropy Loss

We consider the problem of twenty questions with noisy answers, in which we seek to find a target by repeatedly choosing a set, asking an oracle whether the target lies in this set, and obtaining an answer corrupted by noise. Starting with a prior distribution on the target’s location, we seek to minimize the expected entropy of the posterior distribution. We formulate this problem as a dynamic program and show that any policy optimizing the one-step expected reduction in entropy is also optimal over the full horizon. Two such Bayes optimal policies are presented: one generalizes the probabilistic bisection policy due to Horstein and the other asks a deterministic set of questions. We study the structural properties of the latter, and illustrate its use in a computer vision application.

[1]  Carlos S. Kubrusly,et al.  Stochastic approximation algorithms and applications , 1973, CDC 1973.

[2]  H. Robbins A Stochastic Approximation Method , 1951 .

[3]  R. Nowak,et al.  Generalized binary search , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[4]  Onésimo Hernández-Lerma,et al.  Controlled Markov Processes , 1965 .

[5]  Richard M. Karp,et al.  Noisy binary search and its applications , 2007, SODA '07.

[6]  J. Blum Multidimensional Stochastic Approximation Methods , 1954 .

[7]  Donald Geman,et al.  An Active Testing Model for Tracking Roads in Satellite Images , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[9]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[10]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[11]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[12]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[13]  Raphael Sznitman,et al.  Active Testing for Face Detection and Localization , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Robert Nowak,et al.  Active Learning and Sampling , 2008 .

[15]  M. Degroot Optimal Statistical Decisions , 1970 .

[16]  Peter I. Frazier,et al.  A Bayesian approach to stochastic root finding , 2011, Proceedings of the 2011 Winter Simulation Conference (WSC).

[17]  Michael Horstein,et al.  Sequential transmission using noiseless feedback , 1963, IEEE Trans. Inf. Theory.

[18]  Robert D. Nowak,et al.  Noisy Generalized Binary Search , 2009, NIPS.

[19]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[20]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[21]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[22]  Andrzej Pelc,et al.  Searching games with errors - fifty years of coping with liars , 2002, Theor. Comput. Sci..

[23]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[24]  Christoph H. Lampert,et al.  Efficient Subwindow Search: A Branch and Bound Framework for Object Localization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Avinatan Hassidim,et al.  The Bayesian Learner is Optimal for Noisy Binary Search  (and Pretty Good for Quantum as Well) , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[26]  Warren B. Powell,et al.  A Knowledge-Gradient Policy for Sequential Information Collection , 2008, SIAM J. Control. Optim..

[27]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[29]  P. Whittle Arm-Acquiring Bandits , 1981 .

[30]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[31]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .