Online Residential Demand Response via Contextual Multi-Armed Bandits

Residential loads have great potential to enhance the efficiency and reliability of electricity systems via demand response (DR) programs. One major challenge in residential DR is how to learn and handle unknown and uncertain customer behaviors. In this letter, we consider the residential DR problem where the load service entity (LSE) aims to select an optimal subset of customers to optimize some DR performance, such as maximizing the expected load reduction with a financial budget or minimizing the expected squared deviation from a target reduction level. To learn the uncertain customer behaviors influenced by various time-varying environmental factors, we formulate the residential DR as a contextual multi-armed bandit (MAB) problem, and develop an online learning and selection (OLS) algorithm based on Thompson sampling to solve it. This algorithm takes the contextual information into consideration and is applicable to complicated DR settings. Numerical simulations are performed to demonstrate the learning effectiveness of the proposed algorithm.

[1]  Ahmadreza Moradipari,et al.  LEARNING TO DYNAMICALLY PRICE ELECTRICITY DEMAND BASED ON MULTI-ARMED BANDITS , 2018, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[2]  Gilles Stoltz,et al.  Target Tracking for Contextual Bandits: Application to Demand Side Management , 2019, ICML.

[3]  Anirban Basu,et al.  Privacy-friendly secure bidding for smart grid demand-response , 2017, Inf. Sci..

[4]  Chien-fei Chen,et al.  Promoting acceptance of direct load control programs in the United States: Financial incentive versus control option , 2018 .

[5]  Benjamin Van Roy,et al.  Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.

[6]  Yuguang Fang,et al.  A Privacy-Preserving Scheme for Incentive-Based Demand Response in the Smart Grid , 2016, IEEE Transactions on Smart Grid.

[7]  Benjamin Van Roy,et al.  Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..

[8]  Benjamin Van Roy,et al.  A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..

[9]  P. Diaconis,et al.  Conjugate Priors for Exponential Families , 1979 .

[10]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[11]  Michael I. Jordan,et al.  A Variational Approach to Bayesian Logistic Regression Models and their Extensions , 1997, AISTATS.

[12]  Le Xie,et al.  Coupon Incentive-Based Demand Response: Theory and Case Study , 2013, IEEE Transactions on Power Systems.

[13]  Gesche M. Huebner,et al.  Public acceptability of domestic demand-side response in Great Britain: The role of automation and direct load control , 2015 .

[14]  Pan Li,et al.  A Distributed Online Pricing Strategy for Demand Response Programs , 2017, IEEE Transactions on Smart Grid.

[15]  S. Menard Applied Logistic Regression Analysis , 1996 .

[16]  Zheng Wen,et al.  Optimal Demand Response Using Device-Based Reinforcement Learning , 2014, IEEE Transactions on Smart Grid.

[17]  Joshua A. Taylor,et al.  Index Policies for Demand Response , 2014, IEEE Transactions on Power Systems.

[18]  Qiong Wu,et al.  Bandit Learning for Diversified Interactive Recommendation , 2019, ArXiv.

[19]  Mingyan Liu,et al.  Adaptive demand response: Online learning of restless and controlled bandits , 2014, 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm).

[20]  Giorgio Rizzoni,et al.  Residential Demand Response: Dynamic Energy Management and Time-Varying Electricity Pricing , 2016, IEEE Transactions on Power Systems.

[21]  Y. Narahari,et al.  A Multiarmed Bandit Incentive Mechanism for Crowdsourcing Demand Response in Smart Grids , 2014, AAAI.

[22]  Julia K. Day,et al.  Investigating willingness to save energy and communication about energy use in the American workplace with the attitude-behavior-context model , 2017 .

[23]  Fangxing Li,et al.  A Framework of Residential Demand Aggregation With Financial Incentives , 2018, IEEE Transactions on Smart Grid.

[24]  S. Oren,et al.  Large-Scale Integration of Deferrable Demand and Renewable Energy Sources , 2014, IEEE Transactions on Power Systems.

[25]  M. E. Baran,et al.  Optimal sizing of capacitors placed on a radial distribution system , 1989 .

[26]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[27]  Jack Bowden,et al.  Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. , 2015, Statistical science : a review journal of the Institute of Mathematical Statistics.

[28]  Yuan Wu,et al.  Demand Response Management via Real-Time Electricity Price Control in Smart Grids , 2013, IEEE Journal on Selected Areas in Communications.

[29]  Massimiliano Pontil,et al.  A note on different covering numbers in learning theory , 2003, J. Complex..

[30]  Mohammed H. Albadi,et al.  A summary of demand response in electricity markets , 2008 .

[31]  Guy R. Newsham,et al.  The effect of utility time-varying pricing and load control strategies on residential summer peak electricity use: A review , 2010 .

[32]  Marco Levorato,et al.  Residential Demand Response Using Reinforcement Learning , 2010, 2010 First IEEE International Conference on Smart Grid Communications.

[33]  Wei Chen,et al.  Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.

[34]  Ian A. Hiskens,et al.  Frequency Regulation From Commercial Building HVAC Demand Response , 2016, Proceedings of the IEEE.

[35]  Antoine Lesage-Landry,et al.  Dispatching thermostatically controlled loads for frequency regulation using adversarial multi-armed bandits , 2017, 2017 IEEE Electrical Power and Energy Conference (EPEC).

[36]  Na Li,et al.  Learning and Selecting the Right Customers for Reliability: A Multi-Armed Bandit Approach , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[37]  Thomas Liebig,et al.  Charging control of electric vehicles using contextual bandits considering the electrical distribution grid , 2019, ArXiv.

[38]  Valentin Robu,et al.  Incentivizing Reliability in Demand-Side Response , 2016, IJCAI.

[39]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..