Energy Efficient Execution of POMDP Policies

Recent advances in planning techniques for partially observable Markov decision processes (POMDPs) have focused on online search techniques and offline point-based value iteration. While these techniques allow practitioners to obtain policies for fairly large problems, they assume that a nonnegligible amount of computation can be done between each decision point. In contrast, the recent proliferation of mobile and embedded devices has lead to a surge of applications that could benefit from state-of-the-art planning techniques if they can operate under severe constraints on computational resources. To that effect, we describe two techniques to compile policies into controllers that can be executed by a mere table lookup at each decision point. The first approach compiles policies induced by a set of alpha vectors (such as those obtained by point-based techniques) into approximately equivalent controllers, while the second approach performs a simulation to compile arbitrary policies into approximately equivalent controllers. We also describe an approach to compress controllers by removing redundant and dominated nodes, often yielding smaller and yet better controllers. Further compression and higher value can sometimes be obtained by considering stochastic controllers. The compilation and compression techniques are demonstrated on benchmark problems as well as a mobile application to help persons with Alzheimer's to way-find. The battery consumption of several POMDP policies is compared against finite-state controllers learned using methods introduced in this paper. Experiments performed on the Nexus 4 phone show that finite-state controllers are the least battery consuming POMDP policies.

[1]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[2]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[3]  Eric A. Hansen,et al.  An Improved Policy Iteration Algorithm for Partially Observable MDPs , 1997, NIPS.

[4]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[5]  Shlomo Zilberstein,et al.  Finite-memory control of partially observable systems , 1998 .

[6]  Kee-Eung Kim,et al.  Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[7]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[8]  Milos Drutarovský,et al.  True Random Number Generator Embedded in Reconfigurable Hardware , 2002, CHES.

[9]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[10]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[11]  Craig Boutilier,et al.  Stochastic Local Search for POMDP Controllers , 2004, AAAI.

[12]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  P. Poupart Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[15]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[16]  Shlomo Zilberstein,et al.  Solving POMDPs using quadratically constrained linear programs , 2006, AAMAS '06.

[17]  Marc Toussaint,et al.  Probabilistic inference for solving (PO) MDPs , 2006 .

[18]  S. Young,et al.  Scaling POMDPs for Spoken Dialog Management , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[20]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[21]  James H. Martin,et al.  Speech and Language Processing, 2nd Edition , 2008 .

[22]  Eric A. Hansen,et al.  Sparse Stochastic Finite-State Controllers for POMDPs , 2008, UAI.

[23]  Guy Shani,et al.  Prioritizing Point-Based POMDP Solvers , 2008, IEEE Trans. Syst. Man Cybern. Part B.

[24]  Shlomo Zilberstein,et al.  Achieving goals in decentralized POMDPs , 2009, AAMAS.

[25]  Hui Li,et al.  Multi-task Reinforcement Learning in Partially Observable Stochastic Environments , 2009, J. Mach. Learn. Res..

[26]  Shlomo Zilberstein,et al.  Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs , 2010, Autonomous Agents and Multi-Agent Systems.

[27]  Guy Shani Evaluating Point-Based POMDP Solvers on Multicore Machines , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  Blai Bonet,et al.  Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs , 2010, AAAI.

[29]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[30]  Marc Toussaint,et al.  Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains , 2011, ECML/PKDD.

[31]  Jaakko Peltonen,et al.  Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning , 2011, NIPS.

[32]  Lawrence Carin,et al.  The Infinite Regionalized Policy Representation , 2011, ICML.

[33]  Kee-Eung Kim,et al.  Closing the Gap: Improved Bounds on Optimal POMDP Solutions , 2011, ICAPS.

[34]  Guy Shani,et al.  Noname manuscript No. (will be inserted by the editor) A Survey of Point-Based POMDP Solvers , 2022 .

[35]  Martin A. Riedmiller,et al.  Batch Reinforcement Learning , 2012, Reinforcement Learning.

[36]  Jesse Hoey,et al.  LaCasa: Location and context-aware safety assistant , 2012, 2012 6th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) and Workshops.

[37]  Jesse Hoey,et al.  Isomorph-Free Branch and Bound Search for Finite State Controllers , 2013, IJCAI.

[38]  Patrick Olivier,et al.  People, sensors, decisions: Customizable and adaptive technologies for assistance in healthcare , 2012, TIIS.

[39]  Daniel Jackson,et al.  Relational Approach to Knowledge Engineering for POMDP-based Assistance Systems as a Translation of a Psychological Model , 2012, Int. J. Approx. Reason..

[40]  Pascal Poupart,et al.  Partially Observable Markov Decision Processes , 2010, Encyclopedia of Machine Learning.