论文信息 - S-TRIGGER: Continual State Representation Learning via Self-Triggered Generative Replay

S-TRIGGER: Continual State Representation Learning via Self-Triggered Generative Replay

We consider the problem of building a state representation model for control, in a continual learning setting. As the environment changes, the aim is to efficiently compress the sensory state's information without losing past knowledge, and then use Reinforcement Learning on the resulting features for efficient policy learning. To this end, we propose S-TRIGGER, a general method for Continual State Representation Learning applicable to Variational Auto-Encoders and its many variants. The method is based on Generative Replay, i.e. the use of generated samples to maintain past knowledge. It comes along with a statistically sound method for environment change detection, which self-triggers the Generative Replay. Our experiments on VAEs show that S-TRIGGER learns state representations that allows fast and high-performing Reinforcement Learning, while avoiding catastrophic forgetting. The resulting system is capable of autonomously learning new information without using past data and with a bounded system size. Code for our experiments is attached in Appendix.

David Filliat | Michaël Garcia Ortiz | Hugo Caselles-Dupré

[1] R. French. Catastrophic Forgetting in Connectionist Networks , 2006 .

[2] David Filliat,et al. State Representation Learning for Control: An Overview , 2018, Neural Networks.

[3] Joel Z. Leibo,et al. Unsupervised Predictive Memory in a Goal-Directed Agent , 2018, ArXiv.

[4] David Filliat,et al. S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning , 2018, ArXiv.

[5] David Filliat,et al. Flatland: a Lightweight First-Person 2-D Environment for Reinforcement Learning , 2018, ArXiv.

[6] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[7] Guillaume Desjardins,et al. Understanding disentangling in $\beta$-VAE , 2018, 1804.03599.

[8] David Filliat,et al. Generative Models from the perspective of Continual Learning , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[9] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[10] Christopher Burgess,et al. DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[11] Bogdan Raducanu,et al. Memory Replay GANs: learning to generate images from new categories without forgetting , 2018, NeurIPS.