论文信息 - The Multi-Armed Bandit With Stochastic Plays

The Multi-Armed Bandit With Stochastic Plays

We extend the stochastic multi-armed bandit to the case where the number of arms to play evolves as a stationary process. Our work is motivated by demand response in power systems, in which the number of arms to play, or loads to dispatch, depends on a random power imbalance. We give an upper confidence bound-based algorithm that achieves sublinear pseudo-regret. We apply our results in several examples from demand response.

Antoine Lesage-Landry | Joshua A. Taylor | Joshua A. Taylor | Antoine Lesage-Landry

[1] Antoine Lesage-Landry,et al. Learning to Shift Thermostatically Controlled Loads , 2017, HICSS.

[2] Ram Rajagopal,et al. Online learning for demand response , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[3] Atsuyoshi Nakamura,et al. Algorithms for Adversarial Bandit Problems with Multiple Plays , 2010, ALT.

[4] Wei Chen,et al. Combinatorial multi-armed bandit: general framework, results and applications , 2013, ICML 2013.

[5] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7] Alec N. Brooks,et al. Vehicle-to-grid demonstration project: grid regulation ancillary service with a battery electric vehicle. , 2002 .

[8] Peter Palensky,et al. Demand Side Management: Demand Response, Intelligent Energy Systems, and Smart Loads , 2011, IEEE Transactions on Industrial Informatics.

[9] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[10] Vijay Arya,et al. Planning Curtailment of Renewable Generation in Power Grids , 2016, ICAPS.

[11] Bhaskar Krishnamachari,et al. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[12] Zheng Wen,et al. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2014, AISTATS.

[13] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[14] Joshua A. Taylor,et al. Index Policies for Demand Response , 2014, IEEE Transactions on Power Systems.

[15] P. Whittle. Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[16] Ian A. Hiskens,et al. Achieving Controllability of Electric Loads , 2011, Proceedings of the IEEE.

[17] Alexandre Proutière,et al. Combinatorial Bandits Revisited , 2015, NIPS.

[18] Johanna L. Mathieu,et al. Uncertainty in Demand Response—Identification, Estimation, and Learning , 2015 .

[19] Mingyan Liu,et al. Adaptive demand response: Online learning of restless and controlled bandits , 2014, 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm).

[20] R. Agrawal,et al. Multi-armed bandit problems with multiple plays and switching cost , 1990 .

[21] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.