Active learning and hypothesis testing

This dissertation considers a generalization of the classical hypothesis testing problem. Suppose there are M hypotheses of interest among which only one is true. A Bayesian decision maker is responsible to collect observation samples so as to enhance his information about the true hypothesis in a speedy manner while accounting for the penalty of wrong declaration. In contrast to the classical hypothesis testing problem, at any given time, the decision maker can choose one of the available sensing actions and hence, exert some control over the collected samples' information content. This generalization, referred to as the active hypothesis testing, naturally arises in a broad spectrum of applications such as medical diagnosis, cognition, communication, sensor management, image inspection, generalized search, and group testing. The first part of the dissertation provides a theoretical analysis of the problem of active hypothesis testing. Using results in sequential analysis and dynamic programming, lower bounds for the optimal performance are established. The lower bounds are complementary for various values of the parameters of the problem, and characterize the fundamental limits on the maximum achievable information acquisition rate and the optimal reliability. Moreover, upper bounds are obtained via an analysis of the proposed heuristic policies for dynamic selection of actions. From the obtained bounds, sufficient conditions are provided under which the maximum information acquisition rate and reliability are achieved, establishing the asymptotic optimality of the proposed heuristics. The second part of the dissertation investigates the applications of the first part for three important special cases of the active hypothesis testing. Chapter 5 considers the problem of conveying a message over discrete memoryless channels with noiseless feedback. Chapter 6 studies the problem of two-dimensional search to locate a target in an image against a background of distractors. Finally, in Chapter 7, the problem of active learning for multiclass classification is investigated where the outcomes of label queries are corrupted by noise. In each of these chapters, the results in the first part of the dissertation are specialized, new results are obtained, and many of the known results are recovered with concise proofs

[1]  P. Billingsley,et al.  Probability and Measure , 1980 .

[2]  Alfred O. Hero,et al.  Sensor Management: Past, Present, and Future , 2011, IEEE Sensors Journal.

[3]  H. Vincent Poor,et al.  Feedback in the Non-Asymptotic Regime , 2011, IEEE Transactions on Information Theory.

[4]  David A. Castañón Optimal search strategies in dynamic hypothesis testing , 1995, IEEE Trans. Syst. Man Cybern..

[5]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[6]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[7]  Michael Horstein,et al.  Sequential transmission using noiseless feedback , 1963, IEEE Trans. Inf. Theory.

[8]  H. Witsenhausen A Counterexample in Stochastic Optimum Control , 1968 .

[9]  H. Chernoff Sequential Design of Experiments , 1959 .

[10]  J. Kiefer,et al.  Asymptotically Optimum Sequential Inference and Design , 1963 .

[11]  Edward C. Posner Optimal search procedures , 1963, IEEE Trans. Inf. Theory.

[12]  Baris Nakiboglu Exponential bounds on error probability with Feedback , 2011 .

[13]  Evgueni Haroutunian,et al.  Reliability Criteria in Information Theory and in Statistical Hypothesis Testing , 2008, Found. Trends Commun. Inf. Theory.

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[16]  Richard E. Blahut,et al.  Hypothesis testing and information theory , 1974, IEEE Trans. Inf. Theory.

[17]  Robert D. Nowak,et al.  Noisy Generalized Binary Search , 2009, NIPS.

[18]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[19]  Pradeep Shenoy,et al.  Rational Decision-Making in Inhibitory Control , 2011, Front. Hum. Neurosci..

[20]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[21]  Michael Gastpar,et al.  Anthropic Correction of Information Estimates and Its Application to Neural Coding , 2010, IEEE Transactions on Information Theory.

[22]  Geoffrey A. Hollinger,et al.  Active Classification: Theory and Application to Underwater Inspection , 2011, ISRR.

[23]  G. Lorden,et al.  A Control Problem Arising in the Sequential Design of Experiments , 1986 .

[24]  L. Stone Theory of Optimal Search , 1975 .

[25]  Tara Javidi,et al.  Active Sequential Hypothesis Testing , 2012, ArXiv.

[26]  Venugopal V. Veeravalli,et al.  Multihypothesis sequential probability ratio tests - Part I: Asymptotic optimality , 1999, IEEE Trans. Inf. Theory.

[27]  Meir Feder,et al.  Optimal Feedback Communication Via Posterior Matching , 2009, IEEE Transactions on Information Theory.

[28]  E. Lehmann Comparing Location Experiments , 1988 .

[29]  Matti Kääriäinen,et al.  Active Learning in the Non-realizable Case , 2006, ALT.

[30]  R. Gallager Information Theory and Reliable Communication , 1968 .

[31]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[32]  G. Lorden On Excess Over the Boundary , 1970 .

[33]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[34]  J. Wolfowitz,et al.  Optimum Character of the Sequential Probability Ratio Test , 1948 .

[35]  Venkatesh Saligrama,et al.  Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds with efficient algorithms , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[36]  Emre Telatar,et al.  Variable length coding over an unknown channel , 2006, IEEE Transactions on Information Theory.

[37]  Masahito Hayashi,et al.  Discrimination of Two Channels by Adaptive Methods and Its Application to Quantum System , 2008, IEEE Transactions on Information Theory.

[38]  D. Meeter,et al.  Sequential Experimental Design Procedures , 1973 .

[39]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[40]  D. Blackwell Equivalent Comparisons of Experiments , 1953 .

[41]  Shlomo Shamai,et al.  Propagation, Feedback and Belief , 2006 .

[42]  Maxim Raginsky,et al.  Lower Bounds for Passive and Active Learning , 2011, NIPS.

[43]  Joseph B. Kadane,et al.  Optimal Whereabouts Search , 1971, Oper. Res..

[44]  E. Berlekamp Block coding with noiseless feedback , 1964 .

[45]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[46]  Tara Javidi,et al.  Information utility in active sequential hypothesis testing , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[47]  P. Armitage Sequential Analysis with More than Two Alternative Hypotheses, and its Relation to Discriminant Function Analysis , 1950 .

[48]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[49]  M. Degroot Uncertainty, Information, and Sequential Experiments , 1962 .

[50]  Dino Sejdinovic,et al.  Note on noisy group testing: Asymptotic bounds and belief propagation reconstruction , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[51]  Maria-Florina Balcan,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[52]  George Atia,et al.  Controlled Sensing for Multihypothesis Testing , 2012, IEEE Transactions on Automatic Control.

[53]  M. Degroot,et al.  Comparison of Experiments and Information Measures , 1979 .

[54]  Robert D. Nowak,et al.  The Geometry of Generalized Binary Search , 2009, IEEE Transactions on Information Theory.

[55]  M. Degroot Optimal Statistical Decisions , 1970 .

[56]  Emre Telatar,et al.  A Simple Converse of Burnashev's Reliability Function , 2006, IEEE Transactions on Information Theory.

[57]  M. V. Burnašev SEQUENTIAL DISCRIMINATION OF HYPOTHESES WITH CONTROL OF OBSERVATIONS , 1980 .

[58]  Yasubumi Sakakibara,et al.  On Learning from Queries and Counterexamples in the Presence of Noise , 1991, Inf. Process. Lett..

[59]  J. Nachlas,et al.  Diagnostic-strategy selection for series systems , 1990 .

[60]  J.M. Ooi,et al.  Fast Iterative Coding Techniques for Feedback Channels , 1998, IEEE Trans. Inf. Theory.

[61]  M. Iwen Group testing strategies for recovery of sparse signals in noise , 2009, 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers.

[62]  G. Toussaint Some functional lower bounds on the expected divergence for multihypothesis pattern recognition, communication, and radar systems , 1971 .

[63]  Sanjoy Dasgupta,et al.  Two faces of active learning , 2011, Theor. Comput. Sci..

[64]  Ertem Tuncel,et al.  On error exponents in hypothesis testing , 2005, IEEE Transactions on Information Theory.

[65]  John Langford,et al.  Agnostic Active Learning Without Constraints , 2010, NIPS.

[66]  Matthew Malloy,et al.  Sequential analysis in high-dimensional multiple testing and sparse recovery , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[67]  C. R. Rao,et al.  On the convexity of some divergence measures based on entropy functions , 1982, IEEE Trans. Inf. Theory.