Bayesian Sequential Detection with Phase-Distributed Change Time and Nonlinear Penalty -- A POMDP Approach

We show that the optimal decision policy for several types of Bayesian sequential detection problems has a threshold switching curve structure on the space of posterior distributions. This is established by using lattice programming and stochastic orders in a partially observed Markov decision process (POMDP) framework. A stochastic gradient algorithm is presented to estimate the optimal linear approximation to this threshold curve. We illustrate these results by first considering quickest time detection with phase-type distributed change time and a variance stopping penalty. Then it is proved that the threshold switching curve also arises in several other Bayesian decision problems such as quickest transient detection, exponential delay (risk-sensitive) penalties, stopping time problems in social learning, and multi-agent scheduling in a changing world. Using Blackwell dominance, it is shown that for dynamic decision making problems, the optimal decision policy is lower bounded by a myopic policy. Finally, it is shown how the achievable cost of the optimal decision policy varies with change time distribution by imposing a partial order on transition matrices.

[1]  Lones Smith,et al.  Pathological Outcomes of Observational Learning , 2000 .

[2]  I. Olkin,et al.  Inequalities: Theory of Majorization and Its Applications , 1980 .

[3]  A. Banerjee,et al.  A Simple Model of Herd Behavior , 1992 .

[4]  W. Whitt Multivariate monotone likelihood ratio and uniform conditional stochastic order , 1982 .

[5]  Michèle Basseville,et al.  Detection of Abrupt Changes: Theory and Applications. , 1995 .

[6]  Benjamin Yakir,et al.  Optimal detection of a change in distribution when the observations form a Markov chain with a finite state space , 1994 .

[7]  Masaaki Kijima,et al.  Markov processes for stochastic modeling , 1997 .

[8]  A. Bensoussan,et al.  Optimal sensor scheduling in nonlinear filtering of diffusion processes , 1989 .

[9]  Lones Smith,et al.  Informational Herding and Optimal Experimentation , 2006 .

[10]  S. Karlin,et al.  Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions , 1980 .

[11]  William S. Lovejoy Technical Note - On the Convexity of Policy Regions in Partially Observed Systems , 1987, Oper. Res..

[12]  A. Shiryaev On Optimum Methods in Quickest Detection Problems , 1963 .

[13]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[14]  David W. Lewis,et al.  Matrix theory , 1991 .

[15]  R. Amir Supermodularity and Complementarity in Economics: An Elementary Survey , 2003 .

[16]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[17]  John B. Moore,et al.  Hidden Markov Models: Estimation and Control , 1994 .

[18]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[19]  G. Simons Great Expectations: Theory of Optimal Stopping , 1973 .

[20]  Ulrich Rieder,et al.  Structural results for partially observed control models , 1991, ZOR Methods Model. Oper. Res..

[21]  A. Müller,et al.  Comparison Methods for Stochastic Models and Risks , 2002 .

[22]  H. Poor Quickest detection with exponential penalty for delay , 1998 .

[23]  Albert N. Shiryaev,et al.  Optimal Stopping Rules , 2011, International Encyclopedia of Statistical Science.

[24]  G. Moustakides Optimal stopping times for detecting changes in distributions , 1986 .

[25]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[26]  Venugopal V. Veeravalli,et al.  Bayesian quickest change process detection , 2009, 2009 IEEE International Symposium on Information Theory.

[27]  S.,et al.  Risk-Sensitive Control and Dynamic Games for Partially Observed Discrete-Time Nonlinear Systems , 1994 .

[28]  Marcel F. Neuts,et al.  Structured Stochastic Matrices of M/G/1 Type and Their Applications , 1989 .

[29]  A Orman,et al.  Optimization of Stochastic Models: The Interface Between Simulation and Optimization , 2012, J. Oper. Res. Soc..

[30]  J. Keilson,et al.  Monotone matrices and monotone Markov processes , 1977 .

[31]  W. Rudin Principles of mathematical analysis , 1964 .

[32]  Gang George Yin,et al.  Recursive algorithms for estimation of hidden Markov models and autoregressive models with Markov regime , 2002, IEEE Trans. Inf. Theory.

[33]  V. Krishnamurthy,et al.  Implementation of gradient estimation to a constrained Markov decision problem , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[34]  Moshe Pollak,et al.  Detecting a change in regression: first-order optimality , 1999 .

[35]  C. White,et al.  Application of Jensen's inequality to adaptive suboptimal design , 1980 .

[36]  Peter L. Bartlett,et al.  Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning , 2000, J. Comput. Syst. Sci..

[37]  V. Veeravalli,et al.  Bayesian Quickest Transient Change Detection , 2010 .

[38]  S. Bikhchandani,et al.  You have printed the following article : A Theory of Fads , Fashion , Custom , and Cultural Change as Informational Cascades , 2007 .

[39]  Vikram Krishnamurthy,et al.  Optimal Threshold Policies for Multivariate POMDPs in Radar Resource Management , 2009, IEEE Transactions on Signal Processing.

[40]  Munther A. Dahleh,et al.  Preliminary results on social learning with partial observations , 2007, ValueTools '07.

[41]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[42]  A. Kaufmann,et al.  Methods and models of operations research , 1963 .

[43]  Daniel P. Heyman,et al.  Stochastic models in operations research , 1982 .

[44]  Vikram Krishnamurthy,et al.  Algorithms for optimal scheduling and management of hidden Markov model sensors , 2002, IEEE Trans. Signal Process..

[45]  Gang George Yin,et al.  Regime Switching Stochastic Approximation Algorithms with Application to Adaptive Discrete Stochastic Optimization , 2004, SIAM J. Optim..

[46]  C. Chamley Rational Herds: Economic Models of Social Learning , 2003 .

[47]  W. Fleming Book Review: Discrete-time Markov control processes: Basic optimality criteria , 1997 .

[48]  Albert N. Shiryaev,et al.  Optimal Stopping Rules , 1980, International Encyclopedia of Statistical Science.

[49]  木島 正明 Markov processes for Stochastic modeling , 1997 .

[50]  A. Bensoussan Stochastic Control of Partially Observable Systems , 1992 .