Constrained Thompson Sampling for Real-Time Electricity Pricing With Grid Reliability Constraints

We consider the problem of an electricity aggregator attempting to learn customers’ electricity usage models while implementing a load shaping program by means of broadcasting dispatch signals in real-time. We adopt a multi-armed bandit problem formulation to account for the stochastic and unknown nature of customers’ responses to dispatch signals. We propose a constrained Thompson sampling heuristic, Con-TS-RTP, as a solution to the load shaping problem of the electricity aggregator attempting to influence customers’ usage to match various desired demand profiles (i.e., to reduce demand at peak hours, integrate more intermittent renewable generation, track a desired daily load profile, etc). The proposed Con-TS-RTP heuristic accounts for day-varying target load profiles (i.e., multiple target load profiles reflecting renewable forecasts and desired demand patterns) and takes into account the operational constraints of a distribution system to ensure that customers receive adequate service and to avoid potential grid failures. We provide a discussion on the regret bounds for our algorithm as well as a discussion on the operational reliability of the distribution system’s constraints being upheld throughout the learning process.

[1]  Eilyan Bitar,et al.  Learning to Buy (and Sell) Demand Response , 2017 .

[2]  Anna Scaglione,et al.  Information infrastructure for cellular load management in green power delivery systems , 2011, 2011 IEEE International Conference on Smart Grid Communications (SmartGridComm).

[3]  Benjamin Van Roy,et al.  A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..

[4]  Ram Rajagopal,et al.  Household Energy Consumption Segmentation Using Hourly Data , 2014, IEEE Transactions on Smart Grid.

[5]  Anna Scaglione,et al.  Coordinated home energy management for real-time power balancing , 2012, 2012 IEEE Power and Energy Society General Meeting.

[6]  Christos Thrampoulidis,et al.  Safe Linear Thompson Sampling , 2019, ArXiv.

[7]  Christos Thrampoulidis,et al.  Safe Linear Thompson Sampling With Side Information , 2021, IEEE Transactions on Signal Processing.

[8]  Viktor K. Prasanna,et al.  Big data analytics for demand response: Clustering over space and time , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[9]  Chongqing Kang,et al.  Load profiling and its application to demand response: A review , 2015 .

[10]  Fangxing Li,et al.  Clustering Load Profiles for Demand Response Applications , 2019, IEEE Transactions on Smart Grid.

[11]  Michael Chertkov,et al.  Learning price-elasticity of smart consumers in power distribution systems , 2012, 2012 IEEE Third International Conference on Smart Grid Communications (SmartGridComm).

[12]  Ahmadreza Moradipari,et al.  LEARNING TO DYNAMICALLY PRICE ELECTRICITY DEMAND BASED ON MULTI-ARMED BANDITS , 2018, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[13]  Shie Mannor,et al.  Thompson Sampling for Complex Online Problems , 2013, ICML.

[14]  Hao Jan Liu,et al.  Decentralized optimization approach for power distribution network and microgrid controls , 2017 .

[15]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[16]  L. Tong,et al.  An online learning approach to dynamic pricing for demand response , 2014, 1404.1325.

[17]  Christos Thrampoulidis,et al.  Linear Stochastic Bandits Under Safety Constraints , 2019, NeurIPS.

[18]  Yury Dvorkin,et al.  Online Learning for Network Constrained Demand Response Pricing in Distribution Systems , 2020, IEEE Transactions on Smart Grid.

[19]  Zhiwei Xu,et al.  Data-Driven Pricing Strategy for Demand-Side Resource Aggregators , 2018, IEEE Transactions on Smart Grid.

[20]  Hao Jan Liu,et al.  Fast Local Voltage Control Under Limited Reactive Power: Optimality and Stability Analysis , 2015, IEEE Transactions on Power Systems.

[21]  Lijun Chen,et al.  Equilibrium and dynamics of local voltage control in distribution systems , 2013, 52nd IEEE Conference on Decision and Control.

[22]  Anna Scaglione,et al.  Least laxity first scheduling of thermostatically controlled loads for regulation services , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[23]  Javier Reneses,et al.  Time-based pricing and electricity demand response: Existing barriers and next steps , 2016 .

[24]  Pan Li,et al.  Linear Estimation of Treatment Effects in Demand Response: An Experimental Design Approach , 2017, ArXiv.

[25]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[26]  Eilyan Bitar,et al.  Risk-Sensitive Learning and Pricing for Demand Response , 2016, IEEE Transactions on Smart Grid.

[27]  Ram Rajagopal,et al.  Online learning for demand response , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[28]  Yury Dvorkin,et al.  Distribution Electricity Pricing Under Uncertainty , 2019, IEEE Transactions on Power Systems.

[29]  Michael Chertkov,et al.  Optimal Distributed Control of Reactive Power Via the Alternating Direction Method of Multipliers , 2013, IEEE Transactions on Energy Conversion.

[30]  Duncan S. Callaway,et al.  A Linearized Power Flow Model for Optimization in Unbalanced Distribution Systems , 2016, 1606.04492.

[31]  Amelia Regan,et al.  Thompson Sampling in Dynamic Systems for Contextual Bandit Problems , 2013, ArXiv.

[32]  Claire J. Tomlin,et al.  Residential demand response targeting using machine learning with observational data , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[33]  Anna Scaglione,et al.  Reduced-Order Load Models for Large Populations of Flexible Appliances , 2015, IEEE Transactions on Power Systems.

[34]  Pan Li,et al.  A Distributed Online Pricing Strategy for Demand Response Programs , 2017, IEEE Transactions on Smart Grid.

[35]  Akshay Krishnamurthy,et al.  Semiparametric Contextual Bandits , 2018, ICML.

[36]  M. E. Baran,et al.  Optimal capacitor placement on radial distribution systems , 1989 .

[37]  M. Caramanis,et al.  Optimal Power Market Participation of Plug-In Electric Vehicles Pooled by Distribution Feeder , 2013, IEEE Transactions on Power Systems.

[38]  Rui Xu,et al.  A Cluster-Based Method for Calculating Baselines for Residential Loads , 2016, IEEE Transactions on Smart Grid.

[39]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[40]  Yury Dvorkin,et al.  Data-Driven Distributionally Robust Optimal Power Flow for Distribution Systems , 2018, IEEE Control Systems Letters.

[41]  Na Li,et al.  Learning and Selecting the Right Customers for Reliability: A Multi-Armed Bandit Approach , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[42]  Haipeng Luo,et al.  Practical Contextual Bandits with Regression Oracles , 2018, ICML.

[43]  Benjamin Van Roy,et al.  Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..

[44]  Duncan S. Callaway,et al.  State Estimation and Control of Electric Loads to Manage Real-Time Energy Imbalance , 2013, IEEE Transactions on Power Systems.

[45]  Ralph Masiello,et al.  Locational Marginal Value of Distributed Energy Resources as Non-Wires Alternatives , 2020, IEEE Transactions on Smart Grid.

[46]  Joakim Jaldén,et al.  Constrained Thompson Sampling for Wireless Link Optimization , 2019, ArXiv.

[47]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[48]  Wei Zhang,et al.  Aggregated Modeling and Control of Air Conditioning Loads for Demand Response , 2013, IEEE Transactions on Power Systems.

[49]  Kyri Baker,et al.  Chance-Constrained AC Optimal Power Flow for Distribution Systems With Renewables , 2017, IEEE Transactions on Power Systems.

[50]  Line A. Roald,et al.  Chance Constraints for Improving the Security of AC Optimal Power Flow , 2018, IEEE Transactions on Power Systems.

[51]  Lang Tong,et al.  Retail pricing for stochastic demand with unknown parameters: An online machine learning approach , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).