Optimal Demand Response Using Device-Based Reinforcement Learning

Demand response (DR) for residential and small commercial buildings is estimated to account for as much as 65% of the total energy savings potential of DR, and previous work shows that a fully automated energy management system (EMS) is a necessary prerequisite to DR in these areas. In this paper, we propose a novel EMS formulation for DR problems in these sectors. Specifically, we formulate a fully automated EMS's rescheduling problem as a reinforcement learning (RL) problem, and argue that this RL problem can be approximately solved by decomposing it over device clusters. Compared with existing formulations, our new formulation does not require explicitly modeling the user's dissatisfaction on job rescheduling, enables the EMS to self-initiate jobs, allows the user to initiate more flexible requests, and has a computational complexity linear in the number of device clusters. We also demonstrate the simulation results of applying Q-learning, one of the most popular and classical RL algorithms, to a representative example.

[1]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[5]  I. E. Lane,et al.  Industrial power demand response analysis for one-part real-time pricing , 1998 .

[6]  S. Braithwait,et al.  THE ROLE OF DEMAND RESPONSE IN ELECTRIC POWER MARKET DESIGN , 2002 .

[7]  S. Borenstein,et al.  Dynamic Pricing, Advanced Metering, and Demand Response in Electricity Markets , 2002 .

[8]  G. Goldman,et al.  A Survey of Utility Experience with Real Time Pricing , 2004 .

[9]  Development and evaluation of fully automated demand response in large facilities , 2004 .

[10]  A. Faruqui,et al.  Quantifying Customer Response to Dynamic Pricing , 2005 .

[11]  Mary Ann Piette,et al.  Architecture Concepts and Technical Issues for an Open, Interoperable Automated Demand Response Infrastructure , 2007 .

[12]  Karen Herter Residential implementation of critical-peak pricing of electricity , 2007 .

[13]  Sila Kiliccote,et al.  Automated Critical Peak Pricing Field Tests: 2006 Pilot Program Description and Results , 2007 .

[14]  A. Rosenfeld,et al.  An exploratory analysis of California residential customer response to critical peak pricing of electricity , 2007 .

[15]  Carl A. Gunter,et al.  An Integrated Architecture for Demand Response Communications and Control , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[16]  Richard S. Sutton,et al.  GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.

[17]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[18]  Marco Levorato,et al.  Residential Demand Response Using Reinforcement Learning , 2010, 2010 First IEEE International Conference on Smart Grid Communications.

[19]  R. Sutton,et al.  GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .

[20]  Michael Chertkov,et al.  Smart finite state devices: A modeling framework for demand response technologies , 2011, IEEE Conference on Decision and Control and European Control Conference.

[21]  Zheng Wen,et al.  Use of Approximate Dynamic Programming for Production Optimization , 2011, ANSS 2011.

[22]  R. Sutton,et al.  Gradient temporal-difference learning algorithms , 2011 .

[23]  Soummya Kar,et al.  Using smart devices for system-level management and control in the smart grid: A reinforcement learning framework , 2012, 2012 IEEE Third International Conference on Smart Grid Communications (SmartGridComm).

[24]  Stijn Vandael,et al.  Self-learning demand side management for a heterogeneous cluster of devices with binary control actions , 2012, 2012 3rd IEEE PES Innovative Smart Grid Technologies Europe (ISGT Europe).

[25]  Zheng Wen,et al.  Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.

[26]  Frank L. Lewis,et al.  Approximate Dynamic Programming for Optimizing Oil Production , 2013 .

[27]  Ronnie Belmans,et al.  Demand response of a heterogeneous cluster of electric water heaters using batch reinforcement learning , 2014, 2014 Power Systems Computation Conference.

[28]  Benjamin Van Roy,et al.  Generalization and Exploration via Randomized Value Functions , 2014, ICML.

[29]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[30]  Pieter Abbeel,et al.  Autonomous Helicopter Flight Using Reinforcement Learning , 2010, Encyclopedia of Machine Learning.

[31]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.