论文信息 - Experience-efficient learning in associative bandit problems

Experience-efficient learning in associative bandit problems

We formalize the associative bandit problem framework introduced by Kaelbling as a learning-theory problem. The learning environment is modeled as a k-armed bandit where arm payoffs are conditioned on an observable input selected on each trial. We show that, if the payoff functions are constrained to a known hypothesis class, learning can be performed efficiently with respect to the VC dimension of this class. We formally reduce the problem of PAC classification to the associative bandit problem, producing an efficient algorithm for any hypothesis class for which efficient classification algorithms are known. We demonstrate the approach empirically on a scalable concept class.

[1] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .

[2] Claude-Nicolas Fiechter. Expected Mistake Bound Model for On-Line Reinforcement Learning , 1997, ICML.

[3] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.

[4] John Langford,et al. Estimating Class Membership Probabilities using Classifier Learners , 2005, AISTATS.

[5] John Langford,et al. Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[6] Peter Auer,et al. An Improved On-line Algorithm for Learning Linear Evaluation Functions , 2000, COLT.

[7] C. Fiechter. PAC Associative Reinforcement Learning , 1995 .

[8] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[9] Leslie Pack Kaelbling,et al. Associative Reinforcement Learning: Functions in k-DNF , 1994, Machine Learning.

[10] Philip W. L. Fong. A Quantitative Study of Hypothesis Selection , 1995, ICML.

[11] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[12] Philip M. Long,et al. Reinforcement Learning with Immediate Rewards and Linear Hypotheses , 2003, Algorithmica.

[13] Charles Elkan,et al. The Foundations of Cost-Sensitive Learning , 2001, IJCAI.