On Solving Event-Based Optimization With Average Reward Over Infinite Stages

Event-based optimization (EBO) provides a unified framework for problems in which decisions can be made only when certain events occur. Because the event sequence usually is not Markovian, the optimal policy could depend on the entire event history, which is hard to implement in practice. So most existing studies focus on memoryless policies, which make decisions only based on the current observable events. But it remains open how to find the optimal memoryless policies in general, leaving alone to solve the EBO optimally. In this technical note, we address these two important questions for infinite-stage EBOs with finite state and action spaces and make the following three major contributions. First, we extend our previous studies on finite-stage EBOs and convert infinite-stage EBOs to partially observable Markov decision processes (POMDPs). The belief process of this POMDP is called belief-event decision process (BEDP). Under certain well-known conditions, the optimal policies of BEDPs can be achieved within stationary Markov deterministic policies. Second, assuming optimal stationary policies exist, the performance difference and derivative formulas are developed. Potentials of memoryless event-based policies are shown to be piecewise linear functions, and thus can be efficiently estimated through sample paths. Third, a potential-based approximate policy iteration algorithm is developed to obtain near-optimal memoryless policies. The convergence and performance loss bound of this algorithm are analyzed.

[1]  Michael D. Lemmon,et al.  Event-triggered distributed optimization in sensor networks , 2009, 2009 International Conference on Information Processing in Sensor Networks.

[2]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[3]  Xi-Ren Cao,et al.  Stochastic learning and optimization - A sensitivity-based approach , 2007, Annual Reviews in Control.

[4]  Xi-Ren Cao,et al.  A basic formula for online policy gradient algorithms , 2005, IEEE Transactions on Automatic Control.

[5]  Qing-Shan Jia,et al.  On Solving Optimal Policies for Finite-Stage Event-Based Optimization , 2011, IEEE Transactions on Automatic Control.

[6]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[7]  Onésimo Hernández-Lerma,et al.  Controlled Markov Processes , 1965 .

[8]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[9]  S. Ross Arbitrary State Markovian Decision Processes , 1968 .

[10]  Vivek S. Borkar,et al.  Ergodic control of partially observed Markov chains , 1998 .

[11]  Karl Johan Åström,et al.  Event Based Control , 2008 .

[12]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[13]  Christos G. Cassandras,et al.  Introduction to Discrete Event Systems , 1999, The Kluwer International Series on Discrete Event Dynamic Systems.

[14]  D. Bertsekas Convergence of discretization procedures in dynamic programming , 1975 .

[15]  M. K. Ghosh,et al.  Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .

[16]  Qing-Shan Jia,et al.  Event-based optimization for dispatching policies in material handling systems of general assembly lines , 2008, 2008 47th IEEE Conference on Decision and Control.

[17]  L. Stettner,et al.  Approximations of discrete time partially observed control problems , 1994 .

[18]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[19]  Christos G. Cassandras,et al.  Asynchronous Distributed Optimization With Event-Driven Communication , 2010, IEEE Transactions on Automatic Control.

[20]  Ari Arapostathis,et al.  On the existence of stationary optimal policies for partially observed MDPs under the long-run average cost criterion , 2006, Syst. Control. Lett..

[21]  Tal Shima,et al.  UAV Cooperative Decision and Control: Challenges and Practical Approaches , 2008 .

[22]  L. Platzman Optimal Infinite-Horizon Undiscounted Control of Finite Probabilistic Systems , 2006 .