Reusing Risk-Aware Stochastic Abstract Policies in Robotic Navigation Learning

In this paper we improve learning performance of a risk-aware robot facing navigation tasks by employing transfer learning; that is, we use information from a previously solved task to accelerate learning in a new task. To do so, we transfer risk-aware memoryless stochastic abstract policies into a new task. We show how to incorporate risk-awareness into robotic navigation tasks, in particular when tasks are modeled as stochastic shortest path problems. We then show how to use a modified policy iteration algorithm, called AbsProb-PI, to obtain risk-neutral and risk-prone memoryless stochastic abstract policies. Finally, we propose a method that combines abstract policies, and show how to use the combined policy in a new navigation task. Experiments validate our proposals and show that one can find effective abstract policies that can improve robot behavior in navigation problems.

[1]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[2]  Reinaldo A. C. Bianchi,et al.  Accelerating autonomous learning by using heuristic selection of actions , 2008, J. Heuristics.

[3]  Andrew G. Barto,et al.  Transfer in Reinforcement Learning via Shared Features , 2012, J. Mach. Learn. Res..

[4]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[5]  John N. Tsitsiklis,et al.  Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.

[6]  Valdinei Freire da Silva,et al.  Shortest Stochastic Path with Risk Sensitive Evaluation , 2012, MICAI.

[7]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[8]  Murray Rosenblatt,et al.  Athens Conference on Applied Probability and Time Series Analysis , 1996 .

[9]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[10]  R. Howard,et al.  Risk-Sensitive Markov Decision Processes , 1972 .

[11]  Bikramjit Banerjee,et al.  General Game Learning Using Knowledge Transfer , 2007, IJCAI.

[12]  Fabio Gagliardi Cozman,et al.  Speeding-up reinforcement learning through abstraction and transfer learning , 2013, AAMAS.

[13]  Fabio Gagliardi Cozman,et al.  Simultaneous Abstract and Concrete Reinforcement Learning , 2011, SARA.

[14]  Peter Stone,et al.  Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping , 2006, AAAI.

[15]  Anna Helena Reali Costa,et al.  Finding Memoryless Probabilistic Relational Policies for Inter-task Reuse , 2012, IPMU.

[16]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[17]  P. Whittle Why discount? The Rationale of Discounting in Optimisation Problems , 1996 .

[18]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[19]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[20]  Sven Koenig,et al.  Probabilistic Planning with Nonlinear Utility Functions , 2006, ICAPS.

[21]  Shie Mannor,et al.  Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..

[22]  Stuart Barber,et al.  All of Statistics: a Concise Course in Statistical Inference , 2005 .