Online revenue maximization for server pricing

Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.

[1]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[2]  Leslie Pack Kaelbling,et al.  Belief space planning assuming maximum likelihood observations , 2010, Robotics: Science and Systems.

[3]  David Hsu,et al.  DESPOT: Online POMDP Planning with Regularization , 2013, NIPS.

[4]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[5]  David Hsu,et al.  DESPOT-Alpha: Online POMDP Planning with Large State and Observation Spaces , 2019, Robotics: Science and Systems.

[6]  Jonathan P. How,et al.  Decision Making Under Uncertainty: Theory and Application , 2015 .

[7]  Alan Fern,et al.  Lower Bounding Klondike Solitaire with Monte-Carlo Planning , 2009, ICAPS.

[8]  Nan Rong,et al.  What makes some POMDP problems easy to approximate? , 2007, NIPS.

[9]  Mykel J. Kochenderfer,et al.  Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces , 2017, ICAPS.

[10]  A. Cassandra A Survey of POMDP Applications , 2003 .

[11]  David Hsu,et al.  Intention-aware online POMDP planning for autonomous driving in a crowd , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[13]  Suman Chakravorty,et al.  Information Space Receding Horizon Control , 2013, IEEE Transactions on Cybernetics.

[14]  Ron Alterovitz,et al.  Motion planning under uncertainty using iterative local optimization in belief space , 2012, Int. J. Robotics Res..

[15]  Jesse Hoey,et al.  Solving POMDPs with Continuous or Large Discrete Observation Spaces , 2005, IJCAI.

[16]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[17]  Turgay Ayer,et al.  OR Forum - A POMDP Approach to Personalize Mammography Screening Decisions , 2012, Oper. Res..

[18]  Mykel J. Kochenderfer,et al.  POMDPs.jl: A Framework for Sequential Decision Making under Uncertainty , 2017, J. Mach. Learn. Res..

[19]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[20]  Mykel J. Kochenderfer,et al.  Optimizing the Next Generation Collision Avoidance System for Safe, Suitable, and Acceptable Operational Performance , 2013 .

[21]  Nancy M. Amato,et al.  FIRM: Feedback controller-based information-state roadmap - A framework for motion planning under uncertainty , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Mykel J. Kochenderfer,et al.  The value of inferring the internal state of traffic participants for autonomous freeway driving , 2017, 2017 American Control Conference (ACC).

[23]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .