Covariant Policy Search

We investigate the problem of non-covariant behavior of policy gradient reinforcement learning algorithms. The policy gradient approach is amenable to analysis by information geometric methods. This leads us to propose a natural metric on controller parameterization that results from considering the manifold of probability distributions over paths induced by a stochastic controller. Investigation of this approach leads to a covariant gradient ascent rule. Interesting properties of this rule are discussed, including its relation with actor-critic style reinforcement learning algorithms. The algorithms discussed here are computationally quite efficient and on some interesting problems lead to dramatic performance improvement over noncovariant rules.

[1]  M. Degroot Optimal Statistical Decisions , 1970 .

[2]  N. N. Chent︠s︡ov Statistical decision rules and optimal inference , 1982 .

[3]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[4]  Anthony G. Cohn,et al.  A Spatial Logic based on Regions and Connection , 1992, KR.

[5]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[7]  Luis Fariñas del Cerro,et al.  A New Tractable Subclass of the Rectangle Algebra , 1999, IJCAI.

[8]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[9]  P. Bartlett,et al.  Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms , 1999 .

[10]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[11]  N. Čencov Statistical Decision Rules and Optimal Inference , 2000 .

[12]  Max J. Egenhofer,et al.  Similarity of Cardinal Directions , 2001, SSTD.

[13]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[14]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[15]  Alfonso Gerevini,et al.  Combining topological and size information for spatial reasoning , 2002, Artif. Intell..

[16]  Stefan Schaal,et al.  Policy Gradient Methods for Robot Control , 2003 .

[17]  Vijay R. Konda,et al.  OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[18]  Spiros Skiadopoulos,et al.  On the consistency of cardinal direction constraints , 2005, Artif. Intell..