论文信息 - Fuzzy C-means method for representation policy iteration in reinforcement learning

Fuzzy C-means method for representation policy iteration in reinforcement learning

This paper introduces a Fuzzy C-means method as the subsampling method for Representation Policy Iteration (RPI) in Reinforcement Learning. RPI is a new class of algorithm that automatically learns both basis functions and approximately optimal policy. In this paper the procedures of the RPI algorithm are as follows. Firstly samples are collected using a random or guided policy. The subset samples are obtained from the original ones subsequently by using the Fuzzy C-means (FCM) method as the subsampling method. Then global basis functions called proto-value functions (PVFs) are formed by using the eigenfunctions of the graph Laplacian operator on an undirected graph constructed from the subset samples. Finally, the least-square policy iteration (LSPI) as the parameter estimation method is used for learning an approximately optimal policy. Illustrative experiments on an Inverted Pendulum problem were accomplished to compare the performance of RPI using the FCM method as the subsampling method with that using the previous subsampling method.

Jun Wu | Xin Xu | Lei Zuo | Zhenhua Huang

[1] Kenneth G. Manton,et al. Fuzzy Cluster Analysis , 2005 .

[2] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[3] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[4] Sridhar Mahadevan,et al. Representation Policy Iteration , 2005, UAI.

[5] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[6] Sridhar Mahadevan,et al. Proto-value functions: developmental reinforcement learning , 2005, ICML.

[7] J. C. Peters,et al. Fuzzy Cluster Analysis : A New Method to Predict Future Cardiac Events in Patients With Positive Stress Tests , 1998 .

[8] Balazs Feil,et al. Fuzzy Clustering and Data Analysis Toolbox For Use with Matlab , 2005 .

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[11] S. Rosenberg. The Laplacian on a Riemannian Manifold: The Laplacian on a Riemannian Manifold , 1997 .