Performance Loss Bound for State Aggregation in a Class of Supply Demand Matching Systems

State aggregation is usually used to handle large-scale Markov decision processes (MDPs). Despite of the computational advantage, state aggregation may result in error in estimating value functions of states and further lead to poor performance in objective value. Various cyber physical energy systems (CPES), including supply demand matching systems, are discrete event dynamic systems, which can usually be formulated as MDP. It is of great practical interest to study performance loss bound for state aggregation in large scale MDPs. In this paper, we consider the performance loss bound for state aggregation in a class of supply demand matching systems. These systems consist of two types of state variables, the action-based and the action-free. We provide a method for aggregating states, which reduces the size of state space and thus save memory space and computing budget. We make the following contributions. First, we provide the performance loss bounds for two sets of naive state aggregations, based on which we propose that the action-free variables are prior to be aggregated when the true value functions or Q-factors are unknown. Second, we propose a k-means based method for aggregating states considering the features of state variables. Third, we consider the problem of battery charging of shared electric vehicles (EVs) in smart grid and test the proposed algorithm. The results are consistent with the performance loss bounds and show that the proposed method performs well.

[1]  Ricard Gavaldà,et al.  Monotone Proofs of the Pigeon Hole Principle , 2001, Math. Log. Q..

[2]  Dimitri P. Bertsekas,et al.  Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.

[3]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[4]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[5]  Benjamin Van Roy Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..

[6]  D. Bertsekas Approximate policy iteration: a survey and some new methods , 2011 .

[7]  B. Krogh,et al.  State aggregation in Markov decision processes , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[8]  Xi-Ren Cao,et al.  Stochastic learning and optimization - A sensitivity-based approach , 2007, Annu. Rev. Control..

[9]  Zhiyuan Ren,et al.  A time aggregation approach to Markov decision processes , 2002, Autom..

[10]  Michael L. Littman,et al.  Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.

[11]  Xuan Zhang,et al.  Decentralized EV-Based Charging Optimization With Building Integrated Wind Energy , 2019, IEEE Transactions on Automation Science and Engineering.

[12]  Junjie Wu,et al.  A Q-Learning Method for Scheduling Shared EVs Under Uncertain User Demand and Wind Power Supply , 2018, 2018 IEEE Conference on Control Technology and Applications (CCTA).

[13]  Qing-Shan Jia,et al.  On State Aggregation to Approximate Complex Value Functions in Large-Scale Markov Decision Processes , 2011, IEEE Transactions on Automatic Control.

[14]  Junjie Wu,et al.  On State Aggregation in a Class of Cyber Physical Energy Systems , 2018, 2018 37th Chinese Control Conference (CCC).

[15]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[16]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[17]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  D. Bertsekas Reinforcement Learning and Optimal ControlA Selective Overview , 2018 .

[20]  Junjie Wu,et al.  Event-Based HVAC Control—A Complexity-Based Approach , 2018, IEEE Transactions on Automation Science and Engineering.