Online learning and pricing for demand response in smart distribution networks

The problem of online learning of consumer response to retail pricing of electricity in a distribution network is considered. In a two-settlement market, the retailer who sets the retail price is exposed to risks from the stochastic response of its consumers and the real-time price fluctuation in the wholesale market. The optimal price maximizing the expected profit is a function of consumer's response to prices, and any pricing scheme under unknown demand model accumulates regret measured by the difference between the total expected profit of the retailer under known and unknown demand model. This paper presents an online learning approach to dynamic pricing aimed at minimizing the regret of the retailer for consumers with unknown Markov jumped affine demand. It is shown that the proposed policy has the lowest order of regret growth characterized by the square-root of the learning horizon.

[1]  S. Borenstein,et al.  Dynamic Pricing, Advanced Metering, and Demand Response in Electricity Markets , 2002 .

[2]  Lang Tong,et al.  Day ahead dynamic pricing for demand response in dynamic environments , 2013, 52nd IEEE Conference on Decision and Control.

[3]  M. Rothschild A two-armed bandit theory of market pricing , 1974 .

[4]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[5]  Bert Zwart,et al.  Simultaneously Learning and Optimizing Using Controlled Variance Pricing , 2014, Manag. Sci..

[6]  Eric W. Cope,et al.  Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[7]  Lang Tong,et al.  Retail pricing for stochastic demand with unknown parameters: An online machine learning approach , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[8]  L. Tong,et al.  Online Learning and Optimization of Markov Jump Affine Models , 2016, ArXiv.

[9]  Assaf J. Zeevi,et al.  Chasing Demand: Learning and Earning in a Changing Environment , 2016, Math. Oper. Res..

[10]  Lang Tong,et al.  Online learning and optimization of Markov jump linear models , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[12]  Assaf J. Zeevi,et al.  Dynamic Pricing with an Unknown Demand Model: Asymptotically Optimal Semi-Myopic Policies , 2014, Oper. Res..

[13]  T. W. Anderson,et al.  Some Experimental Results on the Statistical Properties of Least Squares Estimates in Control Problems , 1976 .

[14]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .