Control Theory Meets POMDPs: A Hybrid Systems Approach

Partially observable Markov decision processes (POMDPs) provide a modeling framework for a variety of sequential decision making under uncertainty scenarios in artificial intelligence (AI). Since the states are not directly observable in a POMDP, decision making has to be performed based on the output of a Bayesian filter (continuous beliefs). Hence, POMDPs are often computationally intractable to solve exactly and researchers resort to approximate methods often using discretizations of the continuous belief space. These approximate solutions are, however, prone to discretization errors, which has made POMDPs ineffective in applications, wherein guarantees for safety, optimality, or performance are required. To overcome the complexity challenge of POMDPs, we apply notions from control theory. The goal is to determine the reachable belief space of a POMDP, that is, the set of all possible evolutions given an initial belief distribution over the states and a set of actions and observations. We begin by casting the problem of analyzing a POMDP into analyzing the behavior of a discrete-time switched system. For estimating the reachable belief space, we find over-approximations in terms of sub-level sets of Lyapunov functions. Furthermore, in order to verify safety and optimality requirements of a given POMDP, we formulate a barrier certificate theorem, wherein we show that if there exists a barrier certificate satisfying a set of inequalities along with the belief update equation of the POMDP, the safety and optimality properties are guaranteed to hold. In both cases, we show how the calculations can be decomposed into smaller problems that can be solved in parallel. The conditions we formulate can be computationally implemented as a set of sum-of-squares programs. We illustrate the applicability of our method by addressing two problems in active ad scheduling and machine teaching.

[1]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[2]  Nicholas Roy,et al.  The Belief Roadmap: Efficient Planning in Linear POMDPs by Factoring the Covariance , 2007, ISRR.

[3]  Joost-Pieter Katoen,et al.  The Probabilistic Model Checking Landscape* , 2016, 2016 31st Annual ACM/IEEE Symposium on Logic in Computer Science (LICS).

[4]  Mounia Lalmas,et al.  Models of user engagement , 2012, UMAP.

[5]  Nan Rong,et al.  Accelerating Point-Based POMDP Algorithms through Successive Approximations of the Optimal Reachable Space , 2007 .

[6]  J. Lasserre Moments, Positive Polynomials And Their Applications , 2009 .

[7]  Paulo Tabuada,et al.  Control Barrier Function Based Quadratic Programs for Safety Critical Systems , 2016, IEEE Transactions on Automatic Control.

[8]  Peter J Seiler,et al.  Help on SOS [Ask the Experts] , 2010 .

[9]  Krishnendu Chatterjee,et al.  What is decidable about partially observable Markov decision processes with ω-regular objectives , 2013, J. Comput. Syst. Sci..

[10]  Yang Zheng,et al.  Sparse sum-of-squares (SOS) optimization: A bridge between DSOS/SDSOS and SOS optimization for sparse polynomials , 2018, 2019 American Control Conference (ACC).

[11]  Mohamadreza Ahmadi,et al.  Barrier functionals for output functional estimation of PDEs , 2015, 2015 American Control Conference (ACC).

[12]  Amir Ali Ahmadi,et al.  Non-monotonic Lyapunov functions for stability of discrete time nonlinear and switched systems , 2008, 2008 47th IEEE Conference on Decision and Control.

[13]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[14]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[15]  Markus Schweighofer,et al.  On the complexity of Putinar's Positivstellensatz , 2005, J. Complex..

[16]  Mohamadreza Ahmadi,et al.  Safety verification for distributed parameter systems using barrier functionals , 2017, Syst. Control. Lett..

[17]  Harish Katti,et al.  CAVVA: Computational Affective Video-in-Video Advertising , 2014, IEEE Transactions on Multimedia.

[18]  Hai Lin,et al.  Privacy Verification in POMDPs via Barrier Certificates , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[19]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[20]  G. Chesi,et al.  LMI Techniques for Optimization Over Polynomials in Control: A Survey , 2010, IEEE Transactions on Automatic Control.

[21]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[22]  P. Parrilo Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization , 2000 .

[23]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[24]  Daniel Liberzon,et al.  Switching in Systems and Control , 2003, Systems & Control: Foundations & Applications.

[25]  Joel W. Burdick,et al.  Safe Policy Synthesis in Multi-Agent POMDPs via Discrete-Time Barrier Functions , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[26]  Xiaojin Zhu,et al.  Machine Teaching: An Inverse Problem to Machine Learning and an Approach Toward Optimal Education , 2015, AAAI.

[27]  Help on Sos , 2022 .

[28]  Nan Rong,et al.  What makes some POMDP problems easy to approximate? , 2007, NIPS.

[29]  Debasish Chatterjee,et al.  On stability of discrete-time switched systems , 2017 .

[30]  Jianghai Hu,et al.  Exponential stabilization of discrete-time switched linear systems , 2009, Autom..

[31]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[32]  H. Burchardt,et al.  Estimating the region of attraction of ordinary differential equations by quantified constraint solving , 2007 .

[33]  João Pedro Hespanha,et al.  Uniform stability of switched linear systems: extensions of LaSalle's Invariance Principle , 2004, IEEE Transactions on Automatic Control.

[34]  Pietro Perona,et al.  Understanding the Role of Adaptivity in Machine Teaching: The Case of Version Space Learners , 2018, NeurIPS.

[35]  Olivier Buffet,et al.  Goal Probability Analysis in Probabilistic Planning: Exploring and Enhancing the State of the Art , 2016, J. Artif. Intell. Res..

[36]  Ufuk Topcu,et al.  Safety assessemt based on physically-viable data-driven models , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[37]  Vikram Krishnamurthy,et al.  Multiple stopping time POMDPs: Structural results , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[38]  Nancy M. Amato,et al.  Robust online belief space planning in changing environments: Application to physical mobile robots , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[39]  D. Braziunas POMDP solution methods , 2003 .

[40]  Ufuk Topcu,et al.  Barrier Certificates for Assured Machine Teaching , 2018, 2019 American Control Conference (ACC).

[41]  Ufuk Topcu,et al.  Controller Synthesis for Safety of Physically-Viable Data-Driven Models , 2018, 1801.04072.

[42]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[43]  Andreas Krause,et al.  Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[44]  A. Vicino,et al.  On convexification of some minimum distance problems , 1999, 1999 European Control Conference (ECC).

[45]  Emilio Frazzoli,et al.  Control of probabilistic systems under dynamic, partially known environments with temporal logic specifications , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[46]  Nils Jansen,et al.  Verification of Uncertain POMDPs Using Barrier Certificates , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[47]  Amir Ali Ahmadi,et al.  Improving efficiency and scalability of sum of squares optimization: Recent advances and limitations , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[48]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[49]  Andreas Milias-Argeitis,et al.  Optimization-based Lyapunov function construction for continuous-time Markov chains with affine transition rates , 2014, 53rd IEEE Conference on Decision and Control.

[50]  Hai Lin,et al.  Privacy Verification and Enforcement via Belief Abstraction , 2018, IEEE Control Systems Letters.

[51]  Amir Ali Ahmadi,et al.  DSOS and SDSOS Optimization: More Tractable Alternatives to Sum of Squares and Semidefinite Optimization , 2017, SIAM J. Appl. Algebra Geom..

[52]  Mykel J. Kochenderfer,et al.  Collision Avoidance Using Partially Controlled Markov Decision Processes , 2011, ICAART.

[53]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[54]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[55]  Aaron D. Ames,et al.  Towards a Framework for Realizable Safety Critical Control through Active Set Invariance , 2018, 2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems (ICCPS).

[56]  Franco Blanchini,et al.  Set-theoretic methods in control , 2007 .

[57]  Guy Shani,et al.  Noname manuscript No. (will be inserted by the editor) A Survey of Point-Based POMDP Solvers , 2022 .

[58]  Sina Ober-Blöbaum,et al.  Second-Order Switching Time Optimization for Switched Dynamical Systems , 2016, IEEE Transactions on Automatic Control.

[59]  B. Reznick,et al.  Sums of squares of real polynomials , 1995 .

[60]  R. Sanfelice,et al.  Hybrid dynamical systems , 2009, IEEE Control Systems.

[61]  M. Stone Applications of the theory of Boolean rings to general topology , 1937 .

[62]  David Hsu,et al.  Importance sampling for online planning under uncertainty , 2018, Int. J. Robotics Res..

[63]  David Maxwell Chickering,et al.  Machine Teaching: A New Paradigm for Building Machine Learning Systems , 2017, ArXiv.