Active Model Selection

Classical learning assumes the learner is given a labeled data sample, from which it learns a model. The field of Active Learning deals with the situation where the learner begins not with a training sample, but instead with resources that it can use to obtain information to help identify the optimal model. To better understand this task, this paper presents and analyses the simplified "(budgeted) active model selection" version, which captures the pure exploration aspect of many active learning problems in a clean and simple problem formulation. Here the learner can use a fixed budget of "model probes" (where each probe evaluates the specified model on a random indistinguishable instance) to identify which of a given set of possible models has the highest expected accuracy. Our goal is a policy that sequentially determines which model to probe next, based on the information observed so far. We present a formal description of this task, and show that it is NP-hard in general. We then investigate a number of algorithms for this task, including several existing ones (eg, "Round-Robin", "Interval Estimation", "Gittins") as well as some novel ones (e.g., "Biased-Robin"), describing first their approximation properties and then their empirical performance on various problem instances. We observe empirically that the simple biased-robin algorithm significantly outperforms the other algorithms in the case of identical costs and priors.

[1]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[2]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[3]  Christos H. Papadimitriou,et al.  Games against nature , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[4]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[5]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[6]  M. Zelen,et al.  Play the Winner Rule and the Controlled Clinical Trial , 1969 .

[7]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[8]  Russell Greiner,et al.  Budgeted Learning of Naive-Bayes Classifiers , 2003, UAI.

[9]  Sanjeev R. Kulkarni,et al.  Finite-time lower bounds for the two-armed bandit problem , 2000, IEEE Trans. Autom. Control..

[10]  Dale Schuurmans,et al.  Sequential PAC learning , 1995, COLT '95.

[11]  Eric R. Ziegel,et al.  Probability and Statistics for Engineering and the Sciences , 2004, Technometrics.

[12]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[13]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[14]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[15]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[16]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[17]  Quentin F. Stout,et al.  Optimal few-stage designs , 2002 .

[18]  Russell Greiner,et al.  Budgeted learning of nailve-bayes classifiers , 2002, UAI 2002.

[19]  Michael Lindenbaum,et al.  Selective Sampling for Nearest Neighbor Classifiers , 1999, Machine Learning.

[20]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[21]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[22]  Andrew G. Barto,et al.  Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .