Dimension reduction and its application to model-based exploration in continuous spaces

The sample complexity of a reinforcement-learning algorithm is highly coupled to how proficiently it explores, which in turn depends critically on the effective size of its state space. This paper proposes a new exploration mechanism for model-based algorithms in continuous state spaces that automatically discovers the relevant dimensions of the environment. We show that this information can be used to dramatically decrease the sample complexity of the algorithm over conventional exploration techniques. This improvement is achieved by maintaining a low-dimensional representation of the transition function. Empirical evaluations in several environments, including simulation benchmarks and a real robotics domain, suggest that the new method outperforms state-of-the-art algorithms and that the behavior is robust and stable.

[1]  John N. Tsitsiklis,et al.  The complexity of dynamic programming , 1989, J. Complex..

[2]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[3]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[4]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[5]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[6]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[9]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[10]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[11]  William D. Smart Explicit Manifold Representations for Value-Function Approximation in Reinforcement Learning , 2004, ISAIM.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Michael L. Littman,et al.  Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.

[14]  Alexander L. Strehl,et al.  Model-Based Reinforcement Learning in Factored-State MDPs , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[15]  Peter Stone,et al.  Model-Based Exploration in Continuous State Spaces , 2007, SARA.

[16]  Kilian Q. Weinberger,et al.  Metric Learning for Kernel Regression , 2007, AISTATS.

[17]  Lihong Li,et al.  An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.

[18]  Michael L. Littman,et al.  Multi-resolution Exploration in Continuous Spaces , 2008, NIPS.

[19]  Sridhar Mahadevan,et al.  Learning Representation and Control in Markov Decision Processes: New Frontiers , 2009, Found. Trends Mach. Learn..

[20]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[21]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[22]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.