Efficient Exploration With Latent Structure

When interacting with a new environment, a robot can improve its online performance by efficiently exploring the effects of its actions. The efficiency of exploration can be expanded significantly by modeling and using latent structure to generalize experiences. We provide a theoretical development of the problem of exploration with latent structure, analyze several algorithms and prove matching lower bounds. We demonstrate our algorithmic ideas on a simple robot car repeatedly traversing a path with two different surface properties.

[1]  E. Ordentlich,et al.  Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .

[2]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[5]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[8]  Philip W. L. Fong A Quantitative Study of Hypothesis Selection , 1995, ICML.

[9]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[10]  Claude-Nicolas Fiechter,et al.  Efficient reinforcement learning , 1994, COLT '94.

[11]  D. Sofge THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .

[12]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[13]  Donald A. Berry,et al.  Bandit Problems: Sequential Allocation of Experiments. , 1986 .

[14]  Peter Stone,et al.  Improving Action Selection in MDP's via Knowledge Transfer , 2005, AAAI.

[15]  Michael L. Littman,et al.  An empirical evaluation of interval estimation for Markov decision processes , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[16]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[17]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for Reinforcement Learning , 2003, ICML.

[19]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[20]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[21]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[22]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[23]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[24]  Jürgen Schmidhuber,et al.  Efficient model-based exploration , 1998 .

[25]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.