论文信息 - An high-efficient online reinforcement learning algorithm for continuous-state systems

An high-efficient online reinforcement learning algorithm for continuous-state systems

In this paper, we consider continuous-state systems and pursue a near-optimal policy through online learning. A new online reinforcement learning algorithm-MSEC (Multi-Samples in Each Cell) is proposed. The proposed algorithm combines state aggregation technique and efficient exploration principle, making high utilization of samples observed online. More concretely, we apply a grid over the continuous state space and partition it into different cells. Then, a near-upper Q iteration operator is defined to use samples in each cell and produce a near-upper Q function, whose corresponding greedy policy is efficient for exploration. MSEC is a totally model-free algorithm, which means no system dynamics is required during the implementation. It collects the system knowledge during the online learning. Based on PAC (Probability Approximately Correct) principle, MSCE can find a near-optimal policy in finite time bound online. To test the performance, an inverted pendulum is simulated and the results show the new algorithm is qualified for solving online optimal control problems.

Dongbin Zhao | Haibo He | Yuanheng Zhu

[1] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[2] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[3] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[4] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.

[5] Jason Pazis,et al. PAC Optimal Exploration in Continuous Space Markov Decision Processes , 2013, AAAI.

[6] Derong Liu,et al. Optimal control for discrete-time affine non-linear systems using general value iteration , 2012 .

[7] Andrey Bernstein,et al. Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains , 2010, Machine Learning.

[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[10] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[11] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.