Index Policies for Demand Response

Demand response programs incentivize loads to actively moderate their energy consumption to aid the power system. Uncertainty is an intrinsic aspect of demand response because a load's capability is often unknown until the load has been deployed. Algorithms must therefore balance utilizing well-characterized, good loads and learning about poorly characterized but potentially good loads; this is a manifestation of the classical tradeoff between exploration and exploitation. We address this tradeoff in a restless bandit framework, a generalization of the well-known multi-armed bandit problem. The formulation yields index policies in which loads are ranked by a scalar index, and those with the highest are deployed. The policy is particularly appropriate for demand response because the indices have explicit analytical expressions that may be evaluated separately for each load, making them both simple and scalable. This formulation serves as a heuristic basis for when only the aggregate effect of demand response is observed, from which the state of each individual load must be inferred. We formulate a tractable, analytical approximation for individual state inference based on observations of aggregate load curtailments. In numerical examples, the restless bandit policy outperforms the greedy policy by 5%-10% of the total cost. When the states of deployed loads are inferred from aggregate measurements, the resulting performance degradation is on the order of a few percent for the (now heuristic) restless bandit policy.

[1]  L. L. Cam,et al.  An approximation theorem for the Poisson binomial distribution. , 1960 .

[2]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[3]  Louis H. Y. Chen Poisson Approximation for Dependent Trials , 1975 .

[4]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[5]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[6]  John N. Tsitsiklis,et al.  The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..

[7]  Dimitris Bertsimas,et al.  Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic , 2000, Oper. Res..

[8]  J. Niño-Mora RESTLESS BANDITS, PARTIAL CONSERVATION LAWS AND INDEXABILITY , 2001 .

[9]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[10]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[11]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[12]  E. Feron,et al.  Multi-UAV dynamic routing with partial observations using restless bandit allocation indices , 2008, 2008 American Control Conference.

[13]  Duncan S. Callaway Tapping the energy storage potential in electric loads to deliver load following and regulation, with application to wind energy , 2009 .

[14]  Mingyan Liu,et al.  Optimality of Myopic Sensing in Multi-Channel Opportunistic Access , 2008, 2008 IEEE International Conference on Communications.

[15]  Qing Zhao,et al.  Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access , 2008, IEEE Transactions on Information Theory.

[16]  Gang Xiong,et al.  Smart (in-home) power scheduling for demand response on the smart grid , 2011, ISGT 2011.

[17]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[18]  Johanna L. Mathieu,et al.  Quantifying Changes in Building Electricity Use, With Application to Demand Response , 2011, IEEE Transactions on Smart Grid.

[19]  Johanna L. Mathieu,et al.  State Estimation and Control of Electric Loads to Manage Real-Time Energy Imbalance , 2013 .