论文信息 - On-Line EM Reinforcement Learning for Automatic Control of Continuous Dynamical Systems

On-Line EM Reinforcement Learning for Automatic Control of Continuous Dynamical Systems

In this paper, we propose a new reinforcement learning (RL) method for dynamical systems that have continuous state and action spaces. Our RL method has an architecture like the actorcritic model. The critic tries to approximate the Q-function, and the actor tries to approximate a stochastic soft-max policy dependent on the Q-function. An on-line EM algorithm is used to train the critic and the actor. We apply this method to two control problems. Computer simulations in two tasks show that our method is able to acquire good control after a few learning trials.

[1] Shin Ishii,et al. Reinforcement Learning Based on On-Line EM Algorithm , 1998, NIPS.

[2] Geoffrey E. Hinton,et al. An Alternative Model for Mixtures of Experts , 1994, NIPS.

[3] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[4] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.

[5] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[6] Shin Ishii,et al. On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[7] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[8] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[9] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .