论文信息 - Detecting Changes and Avoiding Catastrophic Forgetting in Dynamic Partially Observable Environments

Detecting Changes and Avoiding Catastrophic Forgetting in Dynamic Partially Observable Environments

The ability of an agent to detect changes in an environment is key to successful adaptation. This ability involves at least two phases: learning a model of an environment, and detecting that a change is likely to have occurred when this model is no longer accurate. This task is particularly challenging in partially observable environments, such as those modeled with partially observable Markov decision processes (POMDPs). Some predictive learners are able to infer the state from observations and thus perform better with partial observability. Predictive state representations (PSRs) and neural networks are two such tools that can be trained to predict the probabilities of future observations. However, most such existing methods focus primarily on static problems in which only one environment is learned. In this paper, we propose an algorithm that uses statistical tests to estimate the probability of different predictive models to fit the current environment. We exploit the underlying probability distributions of predictive models to provide a fast and explainable method to assess and justify the model's beliefs about the current environment. Crucially, by doing so, the method can label incoming data as fitting different models, and thus can continuously train separate models in different environments. This new method is shown to prevent catastrophic forgetting when new environments, or tasks, are encountered. The method can also be of use when AI-informed decisions require justifications because its beliefs are based on statistical evidence from observations. We empirically demonstrate the benefit of the novel method with simulations in a set of POMDP environments.

[1] Ren-Hou Li,et al. Discovery and learning of models with predictive state representations for dynamical systems without reset , 2009, Knowl. Based Syst..

[2] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[3] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[4] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[5] Marina Kolesnik,et al. Switching Hidden Markov Models for Learning of Motion Patterns in Videos , 2009, ICANN.

[6] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7] Byron Boots,et al. Hilbert Space Embeddings of Predictive State Representations , 2013, UAI.

[8] Guy Shani,et al. Model-Based Online Learning of POMDPs , 2005, ECML.

[9] R. Bellman. A Markovian Decision Process , 1957 .

[10] Sebastian Thrun,et al. Lifelong Learning Algorithms , 1998, Learning to Learn.

[11] K. Pearson. On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[12] H. Akaike. A new look at the statistical model identification , 1974 .

[13] Joelle Pineau,et al. Modelling Sparse Dynamical Systems with Compressed Predictive State Representations , 2013, ICML.

[14] Stephen E. Fienberg,et al. Testing Statistical Hypotheses , 2005 .

[15] Erwan Lecarpentier,et al. Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning , 2019, NeurIPS.

[16] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[17] Tim Chuk,et al. Eye movement analysis with switching hidden Markov models , 2019, Behavior Research Methods.

[18] Carolyn Pillers Dobler,et al. The Practice of Statistics , 2001, Technometrics.

[19] David R. Cox,et al. PRINCIPLES OF STATISTICAL INFERENCE , 2017 .

[20] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.

[21] Evgueni A. Haroutunian,et al. Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[22] Liu Yun-long,et al. Discovery and learning of models with predictive state representations for dynamical systems without reset , 2009 .

[23] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[25] Satinder P. Singh,et al. On discovery and learning of models with predictive representations of state for agents with continuous actions and observations , 2007, AAMAS '07.

[26] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27] Michael H. Bowling,et al. Online Discovery and Learning of Predictive State Representations , 2005, NIPS.

[28] Byron Boots,et al. Predictive State Recurrent Neural Networks , 2017, NIPS.

[29] Yunlong Liu,et al. Learning Predictive State Representations via Monte-Carlo Tree Search , 2016, IJCAI.

[30] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[31] Katja Hofmann,et al. Fast Context Adaptation via Meta-Learning , 2018, ICML.

[32] Mitsuo Kawato,et al. Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[33] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[34] Jonathan D. Cohen,et al. Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement , 2008, NIPS.

[35] Tamim Asfour,et al. ProMP: Proximal Meta-Policy Search , 2018, ICLR.

[36] Jonathan P. How,et al. Optimized Airborne Collision Avoidance , 2015 .

[37] Sebastian Thrun,et al. Learning low dimensional predictive representations , 2004, ICML.