Sequential Decision-Making under Non-stationary Environments via Sequential Change-point Detection

Reinforcement Learning (RL) has been mainly interested in computing an optimal policy for an agent acting in a stationary environment. However, in many real world decision problems the assumption on the stationarity does not hold. One can view a non-stationary environment as a set of contexts (also called modes or modules) where a context corresponds to a possible stationary dynamics of the environment. Even most approaches assume that the number of modes is known, a RL method-Reinforcement Learning with Context Detection (RLCD)-has been recently proposed to learn an a pirori unknown set of contexts and detect context changes. In this paper, we propose a new approach by adapting the tools developed in statistics and more precisely in sequential analysis for detecting an environmental change. Our approach is thus more theoretically founded and necessitates less parameters than RLCD. We also show that our parameters are easier to interpret and therefore easier to tune. Finally, we show experimentally that our approach out-performs the current methods on several application problems.

[1]  T. Lai SEQUENTIAL ANALYSIS: SOME CLASSICAL PROBLEMS AND NEW CHALLENGES , 2001 .

[2]  Paulo Martins Engel,et al.  Dealing with non-stationary environments using context detection , 2006, ICML.

[3]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[4]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[5]  R. Khan,et al.  Sequential Tests of Statistical Hypotheses. , 1972 .

[6]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[7]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[8]  Dit-Yan Yeung,et al.  Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making , 2001, Sequence Learning.

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  David Hsu,et al.  Planning under Uncertainty for Robotic Tasks with Mixed Observability , 2010, Int. J. Robotics Res..

[11]  Paul Weng,et al.  Solving Hidden-Semi-Markov-Mode Markov Decision Problems , 2014, SUM.

[12]  Kumpati S. Narendra,et al.  Adaptation and learning using multiple models, switching, and tuning , 1995 .

[13]  Olivier Buffet,et al.  MOMDPs: A Solution for Modelling Adaptive Management Problems , 2012, AAAI.

[14]  B. K. Ghosh,et al.  Handbook of sequential analysis , 1991 .

[15]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[16]  Dit-Yan Yeung,et al.  An Environment Model for Nonstationary Reinforcement Learning , 1999, NIPS.