论文信息 - Improving state-action space exploration in reinforcement learning using geometric properties

Improving state-action space exploration in reinforcement learning using geometric properties

Learning a model or learning a policy that optimizes some objective function relies on data-sets that describe the behavior of the system. When such sets are unavailable or insufficient, additional data may be generated through new experiments (if feasible) or through simulations (if an accurate model is available). In this paper we describe a third alternative that is based on the availability of a qualitative model of the physical system. In particular, we show how the number of experiments used in reinforcement learning can be reduced by leveraging geometric properties of the system. The geometric properties are independent of any particular instantiation of the qualitative model. As an illustrative example, we apply our approach to a cart-pole system.

Anurag Ganguli | Johan de Kleer | Ion Matei | Raj Minhas

[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2] Brian C. Williams,et al. Qualitative Reasoning about Physical Systems: A Return to Roots , 1991, Artif. Intell..

[3] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[4] Symmetry of Stochastic Equations , 2004, math-ph/0401025.

[5] Peter E. Hydon,et al. Symmetry Methods for Differential Equations: A Beginner's Guide , 2000 .

[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[7] R. Kozlov. On symmetries of stochastic differential equations , 2012 .

[8] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[9] R. Kozlov. On symmetries of the Fokker–Planck equation , 2013 .

[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11] W. Marsden. I and J , 2012 .

[12] F. C. D. Vecchi,et al. Symmetries of stochastic differential equations: A geometric approach , 2015, 1512.05215.

[13] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.

[14] Benjamin Kuipers,et al. Reasoning with Qualitative Models , 1993, Artif. Intell..