Extrinsic Jensen-Shannon divergence and noisy Bayesian active learning

Consider the problem of noisy Bayesian active learning given a sample space, a finite label set, and a finite set of label generating functions from the sample space to the label set, also known as the function class. The objective is to identify the function in the function class that generates the labels using as few label queries as possible and with low probability of error despite possible corruption by independent noise. The key to achieving this objective relies on the selection of queries in a strategic and adaptive manner. The problem generalizes the problem of noisy generalized binary search [1]. Utilizing the connection between the above Bayesian active learning problem and the problem of active hypothesis testing, a heuristic based on Extrinsic Jensen-Shannon divergence [2] is analyzed and general upper bounds are obtained. The performance of this heuristic is compared with the state of the art strategies for noisy generalized binary search. In the case where the function class is of threshold nature, it is shown that this heuristic is better than previous results and, in particular, is order optimal.

[1]  Andreas Krause,et al.  Near-Optimal Bayesian Active Learning with Noisy Observations , 2010, NIPS.

[2]  Tara Javidi,et al.  Extrinsic Jensen–Shannon Divergence: Applications to Variable-Length Coding , 2013, IEEE Transactions on Information Theory.

[3]  Robert D. Nowak,et al.  Noisy Generalized Binary Search , 2009, NIPS.

[4]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[5]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6]  D. A. Bell,et al.  Information Theory and Reliable Communication , 1969 .

[7]  H. Vincent Poor,et al.  Feedback in the Non-Asymptotic Regime , 2011, IEEE Transactions on Information Theory.

[8]  Tara Javidi,et al.  Noisy Bayesian active learning , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[9]  Tara Javidi,et al.  Extrinsic Jensen-Shannon divergence with application in active hypothesis testing , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Tara Javidi,et al.  Active Sequential Hypothesis Testing , 2012, ArXiv.

[12]  Robert D. Nowak,et al.  The Geometry of Generalized Binary Search , 2009, IEEE Transactions on Information Theory.

[13]  R. Nowak,et al.  Generalized binary search , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[14]  Emre Telatar,et al.  A Simple Converse of Burnashev's Reliability Function , 2006, IEEE Transactions on Information Theory.

[15]  M. V. Burnašev SEQUENTIAL DISCRIMINATION OF HYPOTHESES WITH CONTROL OF OBSERVATIONS , 1980 .

[16]  M. Degroot Optimal Statistical Decisions , 1970 .