论文信息 - Covariant Policy Search

Covariant Policy Search

We investigate the problem of non-covariant behavior of policy gradient reinforcement learning algorithms. The policy gradient approach is amenable to analysis by information geometric methods. This leads us to propose a natural metric on controller parameterization that results from considering the manifold of probability distributions over paths induced by a stochastic controller. Investigation of this approach leads to a covariant gradient ascent rule. Interesting properties of this rule are discussed, including its relation with actor-critic style reinforcement learning algorithms. The algorithms discussed here are computationally quite efficient and on some interesting problems lead to dramatic performance improvement over noncovariant rules.

[1] M. Degroot. Optimal Statistical Decisions , 1970 .

[2] N. N. Chent︠s︡ov. Statistical decision rules and optimal inference , 1982 .

[3] James F. Allen. Maintaining knowledge about temporal intervals , 1983, CACM.

[4] Anthony G. Cohn,et al. A Spatial Logic based on Regions and Connection , 1992, KR.

[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[7] Luis Fariñas del Cerro,et al. A New Tractable Subclass of the Rectangle Algebra , 1999, IJCAI.

[8] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[9] P. Bartlett,et al. Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms , 1999 .

[10] Shun-ichi Amari,et al. Methods of information geometry , 2000 .

[11] N. Čencov. Statistical Decision Rules and Optimal Inference , 2000 .

[12] Max J. Egenhofer,et al. Similarity of Cardinal Directions , 2001, SSTD.

[13] Jeff G. Schneider,et al. Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[14] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[15] Alfonso Gerevini,et al. Combining topological and size information for spatial reasoning , 2002, Artif. Intell..

[16] Stefan Schaal,et al. Policy Gradient Methods for Robot Control , 2003 .

[17] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[18] Spiros Skiadopoulos,et al. On the consistency of cardinal direction constraints , 2005, Artif. Intell..