Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing

The Laplacian representation recently gains increasing attention for reinforcement learning as it provides succinct and informative representation for states, by taking the eigenvectors of the Laplacian matrix of the state-transition graph as state embeddings. Such representation captures the geometry of the underlying state space and is beneficial to RL tasks such as option discovery and reward shaping. To approximate the Laplacian representation in large (or even continuous) state spaces, recent works propose to minimize a spectral graph drawing objective, which however has infinitely many global minimizers other than the eigenvectors. As a result, their learned Laplacian representation may differ from the ground truth. To solve this problem, we reformulate the graph drawing objective into a generalized form and derive a new learning objective, which is proved to have eigenvectors as its unique global minimizer. It enables learning high-quality Laplacian representations that faithfully approximate the ground truth. We validate this via comprehensive experiments on a set of gridworld and continuous control environments. Moreover, we show that our learned Laplacian representations lead to more exploratory options and better reward shaping.

[1]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[2]  Marlos C. Machado,et al.  Exploration in Reinforcement Learning with Deep Covering Options , 2020, ICLR.

[3]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Marlos C. Machado,et al.  Count-Based Exploration with the Successor Representation , 2018, AAAI.

[6]  Sridhar Mahadevan,et al.  Proto-value functions: developmental reinforcement learning , 2005, ICML.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Pieter Abbeel,et al.  Decoupling Representation Learning from Reinforcement Learning , 2020, ICML.

[9]  Marlos C. Machado,et al.  Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.

[10]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[11]  Samuel Gershman,et al.  Design Principles of the Hippocampal Cognitive Map , 2014, NIPS.

[12]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[13]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[14]  Joelle Pineau,et al.  Decoupling Dynamics and Reward for Transfer Learning , 2018, ICLR.

[15]  Ronen Basri,et al.  SpectralNet: Spectral Clustering using Deep Neural Networks , 2018, ICLR.

[16]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[17]  Yifan Wu,et al.  The Laplacian in RL: Learning Representations with Efficient Approximations , 2018, ICLR.

[18]  David Pfau,et al.  Spectral Inference Networks: Unifying Deep and Spectral Learning , 2018, ICLR.

[19]  Alexei A. Efros,et al.  Investigating Human Priors for Playing Video Games , 2018, ICML.

[20]  Marlos C. Machado,et al.  Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning , 2021, ICLR.

[21]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[22]  Y. Koren,et al.  Drawing graphs by eigenvectors: theory and practice , 2005 .

[23]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[24]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.