Multiple stopping time POMDPs: Structural results

This paper considers a multiple stopping problem on a Hidden Markov model sample path of infinite horizon; where a reward, dependent on the underlying state, is associated with each stop. The decision maker stops L times to maximize the total expected revenue. The aim is to determine the structure of the optimal multiple stopping policy. The formulation generalizes the classical (single) stopping time Partially Observed Markov Decision (POMDP) problem. Even though the stopping set (in terms of the Bayesian beliefs) is not necessarily convex, we show that is a connected set. The structural results are illustrated using a numerical example.

[1]  Pavel Mrázek,et al.  Selection of Optimal Stopping Time for Nonlinear Diffusion Filtering , 2001, International Journal of Computer Vision.

[2]  John Rust Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher , 1987 .

[3]  T. Nakai The problem of optimal stopping in a partially observable Markov chain , 1985 .

[4]  R. Carmona,et al.  OPTIMAL MULTIPLE STOPPING AND VALUATION OF SWING OPTIONS , 2008 .

[5]  Vikram Krishnamurthy,et al.  Sequential Detection of Market Shocks With Risk-Averse CVaR Social Sensors , 2016, IEEE Journal of Selected Topics in Signal Processing.

[6]  George E. Monahan,et al.  Optimal Stopping in a Partially Observable Markov Process with Costly Information , 1980, Oper. Res..

[7]  Tim Leung,et al.  An Optimal Multiple Stopping Approach to Infrastructure Investment Decisions , 2015, 1502.00861.

[8]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[9]  M. L. Nikolaev On Optimal Multiple Stopping of Markov Sequences , 1999 .

[10]  Vikram Krishnamurthy,et al.  Myopic Bounds for Optimal Policy of POMDPs: An Extension of Lovejoy's Structural Results , 2014, Oper. Res..

[11]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[12]  T. Lai ON OPTIMAL STOPPING PROBLEMS IN SEQUENTIAL HYPOTHESIS TESTING , 1997 .

[13]  S. Silvey Optimal Design: An Introduction to the Theory for Parameter Estimation , 1980 .

[14]  Vikram Krishnamurthy,et al.  Opportunistic Advertisement Scheduling in Live Social Media: A Multiple Stopping Time POMDP Approach , 2016, ArXiv.

[15]  Vikram Krishnamurthy,et al.  Structured Threshold Policies for Dynamic Sensor Scheduling—A Partially Observed Markov Decision Process Approach , 2007, IEEE Transactions on Signal Processing.

[16]  Wolfgang Stadje,et al.  An Optimal k-Stopping Problem for the Poisson Process , 1987 .

[17]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[18]  Anna Krasnosielska-Kobos Multiple-stopping problems with random horizon , 2015 .

[19]  Vikram Krishnamurthy,et al.  Bayesian Sequential Detection With Phase-Distributed Change Time and Nonlinear Penalty—A POMDP Lattice Programming Approach , 2011, IEEE Transactions on Information Theory.

[20]  Robert D. Kleinberg A multiple-choice secretary algorithm with applications to online auctions , 2005, SODA '05.

[21]  Jean Walrand,et al.  Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[22]  Vikram Krishnamurthy How to Schedule Measurements of a Noisy Markov Chain in Decision Making? , 2013, IEEE Transactions on Information Theory.

[23]  Mark A. McComb Comparison Methods for Stochastic Models and Risks , 2003, Technometrics.

[24]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .