Autonomous learning of POMDP state representations from surprises

There is an ever-increasing need for autonomous robots that are capable of operating in a range of challenging environments exhibiting both partial-observability and stochas-ticity. Standard techniques for learning in such environments often require human-engineered features, most commonly a human-designed state space. Engineering these features demands extensive domain knowledge, and changes to the task or the agent often necessitate re-engineering. These limitations have given rise to end-to-end, predictive approaches, such as Predictive State Representations (PSRs) and our Stochastic Distinguishing Experiments (SDEs), that encode a representation of the agent's state in the probabilities of key sequences of actions and observations (i.e., experiments the agent can perform). The problem of discovering appropriate experiments has remained extremely challenging, in part because existing techniques treat them as decoupled from any latent structure in the agent's environment. In this paper, we extend our SDE representation into a hybrid latent-predictive representation of state that can provably model a useful subclass of POMDP environments exactly (and any POMDP environment approximately). We provide an active, incremental algorithm for autonomously learning such representations in unknown environments from experience. The key idea is that the agent begins using only its observations as a state space and splits those states into a hierarchy of additional latent states when it is surprised by the entropy resulting from the repeated executions of experiments that are automatically designed and selected based on these surprises to statistically disambiguate identical-looking states. The results of these experiments form unique predictive labels for each latent state. We present experimental results demonstrating the feasibility of this learning procedure. The corresponding expanded version of this paper provides the theoretical proofs of the representational capacity of this model.

[1]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[2]  Ronald L. Rivest,et al.  Inference of finite automata using homing sequences , 1989, STOC '89.

[3]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[4]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[5]  Michael R. James,et al.  Learning predictive state representations in dynamical systems without reset , 2005, ICML.

[6]  Joelle Pineau,et al.  A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes , 2011, J. Mach. Learn. Res..

[7]  Edward F. Moore,et al.  Gedanken-Experiments on Sequential Machines , 1956 .

[8]  Wei-Min Shen Discovering regularities from knowledge bases , 1992, Int. J. Intell. Syst..

[9]  Jérôme Lang,et al.  Purely Epistemic Markov Decision Processes , 2007, AAAI.

[10]  Feng Liu,et al.  No-Fringe U-Tree: An Optimized Algorithm for Reinforcement Learning , 2016, 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI).

[11]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[12]  Kevin P. Murphy,et al.  A Survey of POMDP Solution Techniques , 2000 .

[13]  H. Simon,et al.  Fitness Requirements for Scientific Theories Containing Recursive Theoretical Terms , 1993, The British Journal for the Philosophy of Science.

[14]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[15]  Wei-Mein Shen Learning Finite Automata Using Local Distinguishing Experiments , 1993, IJCAI.

[16]  Wei-Min Shen,et al.  A robust cognitive architecture for learning from surprises , 2017, BICA 2017.

[17]  Joshua B. Tenenbaum,et al.  Infinite Dynamic Bayesian Networks , 2011, ICML.

[18]  David Pfau,et al.  Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Wei-Min Shen,et al.  Surprise-Based Learning for Developmental Robotics , 2008, 2008 ECSIS Symposium on Learning and Adaptive Behaviors for Robotic Systems (LAB-RS).

[20]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[21]  Finale Doshi-Velez,et al.  The Infinite Partially Observable Markov Decision Process , 2009, NIPS.

[22]  Wei-Min Shen,et al.  Surprise-based learning of state representations , 2018, BICA 2018.

[23]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Michael H. Bowling,et al.  Online Discovery and Learning of Predictive State Representations , 2005, NIPS.

[25]  Andrew McCallum,et al.  Instance-Based State Identification for Reinforcement Learning , 1994, NIPS.

[26]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[27]  Wei-Min Shen,et al.  Autonomous learning from the environment , 1994 .

[28]  Nadeesha Oliver Ranasinghe Learning to detect and adapt to unpredicted changes , 2012 .

[29]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[30]  Yunlong Liu,et al.  Learning Predictive State Representations via Monte-Carlo Tree Search , 2016, IJCAI.

[31]  Wei-Min Shen Discovery as autonomous learning from the environment , 2004, Machine Learning.

[32]  Wei-Min Shen Complementary Discrimination Learning with Decision Lists , 1992, AAAI.

[33]  M. M. Hassan Mahmud,et al.  Constructing States for Reinforcement Learning , 2010, ICML.

[34]  Ronald L. Rivest,et al.  Diversity-based inference of finite automata , 1994, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[35]  Joelle Pineau,et al.  Efficient learning and planning with compressed predictive states , 2013, J. Mach. Learn. Res..

[36]  Wei-Min Shen,et al.  Learning from the environment based on percepts and actions , 1989 .