论文信息 - Reinforcement Learning for Decision-Making and Control in Power Systems: Tutorial, Review, and Vision

Reinforcement Learning for Decision-Making and Control in Power Systems: Tutorial, Review, and Vision

With large-scale integration of renewable generation and distributed energy resources (DERs), modern power systems are confronted with new operational challenges, such as growing complexity, increasing uncertainty, and aggravating volatility. Meanwhile, more and more data are becoming available owing to the widespread deployment of smart meters, smart sensors, and upgraded communication networks. As a result, data-driven control techniques, especially reinforcement learning (RL), have attracted surging attention in recent years. In this paper, we provide a tutorial on various RL techniques and how they can be applied to decision-making and control in power systems. We illustrate RL-based models and solutions in three key applications, including frequency regulation, voltage control, and energy management. We conclude with three critical issues in the application of RL, i.e., safety, scalability, and data. Several potential future directions are discussed as well.

Guannan Qu | Yujie Tang | Na Li | Steven Low | Xin Chen

[1] Lin Gao,et al. Safe deep reinforcement learning-based constrained optimal control scheme for active distribution networks , 2020 .

[2] Taskin Koçak,et al. Smart Grid Technologies: Communication Technologies and Standards , 2011, IEEE Transactions on Industrial Informatics.

[3] Jiawen Li,et al. Deep Reinforcement Learning Based Multi-Objective Integrated Automatic Generation Control for Multiple Continuous Power Disturbances , 2020, IEEE Access.

[4] Erwan Lecarpentier,et al. Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning , 2019, NeurIPS.

[5] Tao Yu,et al. Artificial emotional reinforcement learning for automatic generation control of large-scale interconnected power grids , 2017 .

[6] Na Li,et al. Distributed Optimal Voltage Control With Asynchronous and Delayed Communication , 2019, IEEE Transactions on Smart Grid.

[7] Hao Liang,et al. Distributed Economic Dispatch in Microgrids Based on Cooperative Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[8] Tomoaki Ohtsuki,et al. Deep Reinforcement Learning for Economic Dispatch of Virtual Power Plant in Internet of Energy , 2020, IEEE Internet of Things Journal.

[9] Yuxi Li,et al. Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[10] Zhong Fan,et al. Deep Reinforcement Learning-Based Energy Storage Arbitrage With Accurate Lithium-Ion Battery Degradation Model , 2020, IEEE Transactions on Smart Grid.

[11] Geert Deconinck,et al. Model-predictive control and reinforcement learning in multi-energy system case studies , 2021, ArXiv.

[12] Enrique Mallada,et al. Optimal Load-Side Control for Frequency Regulation in Smart Grids , 2014, IEEE Transactions on Automatic Control.

[13] E. Altman. Constrained Markov Decision Processes , 1999 .

[14] Zhiqiang Wan,et al. Real-Time Residential Demand Response , 2020, IEEE Transactions on Smart Grid.

[15] A. Gastli,et al. Reinforcement Learning Based EV Charging Management Systems–A Review , 2021, IEEE Access.

[16] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[17] Wei Wang,et al. Consensus Multi-Agent Reinforcement Learning for Volt-VAR Control in Power Distribution Networks , 2020, IEEE Transactions on Smart Grid.

[18] Jianchun Peng,et al. Multiobjective Reinforcement Learning-Based Intelligent Approach for Optimization of Activation Rules in Automatic Generation Control , 2019, IEEE Access.

[19] Frank L. Lewis,et al. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems , 2014, Autom..

[20] Haotian Liu,et al. Two-Stage Deep Reinforcement Learning for Inverter-Based Volt-VAR Control in Active Distribution Networks , 2020, IEEE Transactions on Smart Grid.

[21] Yonggang Wen,et al. DeepComfort: Energy-Efficient Thermal Comfort Control in Buildings Via Reinforcement Learning , 2020, IEEE Internet of Things Journal.

[22] Yan Xu,et al. A Multi-Agent Deep Reinforcement Learning Method for Cooperative Load Frequency Control of a Multi-Area Power System , 2020, IEEE Transactions on Power Systems.

[23] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[24] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[25] Yan Xu,et al. Real-Time Optimal Power Flow: A Lagrangian Based Deep Reinforcement Learning Approach , 2020, IEEE Transactions on Power Systems.

[26] Hak-Man Kim,et al. Double Deep $Q$ -Learning-Based Distributed Operation of Battery Energy Storage System Considering Uncertainties , 2020, IEEE Transactions on Smart Grid.

[27] Qi Wang,et al. Integrating Model-Driven and Data-Driven Methods for Power System Frequency Stability Assessment and Control , 2019, IEEE Transactions on Power Systems.

[28] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[29] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[30] Di Shi,et al. A Data-Driven Multi-Agent Autonomous Voltage Control Framework Using Deep Reinforcement Learning , 2020, IEEE Transactions on Power Systems.

[31] Renke Huang,et al. Adaptive Power System Emergency Control Using Deep Reinforcement Learning , 2019, IEEE Transactions on Smart Grid.

[32] Nikolai Matni,et al. Safely Learning to Control the Constrained Linear Quadratic Regulator , 2018, 2019 American Control Conference (ACC).

[33] Abhinav Gupta,et al. Robust Adversarial Reinforcement Learning , 2017, ICML.

[34] Hadi Saadat,et al. Power Systems Analysis , 2002 .

[35] Shie Mannor,et al. Reward Constrained Policy Optimization , 2018, ICLR.

[36] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[37] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[38] S. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.

[39] Adam Wierman,et al. Finite-Time Analysis of Asynchronous Stochastic Approximation and Q-Learning , 2020, COLT.

[40] Baosen Zhang,et al. Reinforcement Learning for Optimal Frequency Control: A Lyapunov Approach , 2020, ArXiv.

[41] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.

[42] Di Shi,et al. A Data-driven Method for Fast AC Optimal Power Flow Solutions via Deep Reinforcement Learning , 2020, Journal of Modern Power Systems and Clean Energy.

[43] Felipe Leno da Silva,et al. Coordination of Electric Vehicle Charging Through Multiagent Reinforcement Learning , 2020, IEEE Transactions on Smart Grid.

[44] Yi Wang,et al. Multienergy Networks Analytics: Standardized Modeling, Optimization, and Low Carbon Analysis , 2020, Proceedings of the IEEE.

[45] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.

[46] Goran Strbac,et al. Multi-Period and Multi-Spatial Equilibrium Analysis in Imperfect Electricity Markets: A Novel Multi-Agent Deep Reinforcement Learning Approach , 2019, IEEE Access.

[47] Lei Wu,et al. Real-Time Optimal Power Flow Using Twin Delayed Deep Deterministic Policy Gradient Algorithm , 2020, IEEE Access.

[48] Seung Ho Hong,et al. Demand Response for Home Energy Management Using Reinforcement Learning and Artificial Neural Network , 2019, IEEE Transactions on Smart Grid.

[49] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[50] Na Li,et al. Online Learning and Distributed Control for Residential Demand Response , 2020, IEEE Transactions on Smart Grid.

[51] Yingchen Zhang,et al. Deep Reinforcement Learning Based Volt-VAR Optimization in Smart Distribution Systems , 2021, IEEE Transactions on Smart Grid.

[52] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[53] Zhehan Yi,et al. Deep-Reinforcement-Learning-Based Autonomous Voltage Control for Power Grid Operations , 2020, IEEE Transactions on Power Systems.

[54] Xiangtian Zheng,et al. Nested Reinforcement Learning Based Control for Protective Relays in Power Distribution Systems , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[55] On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation , 2019, ArXiv.

[56] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[57] Adam Wierman,et al. Multi-Agent Reinforcement Learning in Time-varying Networked Systems , 2020 .

[58] Haibo He,et al. Model-Free Real-Time EV Charging Scheduling Based on Deep Reinforcement Learning , 2019, IEEE Transactions on Smart Grid.

[59] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[60] David Simchi-Levi,et al. Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism , 2020, ICML.

[61] Safety-Guided Deep Reinforcement Learning via Online Gaussian Process Estimation , 2019, ArXiv.

[62] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.

[63] Jianfeng Chen,et al. Smart generation control based on multi-agent reinforcement learning with the idea of the time tunnel , 2017 .

[64] Yan Xu,et al. Data-Driven Load Frequency Control for Stochastic Power Systems: A Deep Reinforcement Learning Method With Continuous Action Search , 2019, IEEE Transactions on Power Systems.

[65] Jie Shi,et al. Batch-Constrained Reinforcement Learning for Dynamic Distribution Network Reconfiguration , 2020, IEEE Transactions on Smart Grid.

[66] Sohrab Asgarpoor,et al. Reinforcement Learning Approach for Optimal Distributed Energy Management in a Microgrid , 2018, IEEE Transactions on Power Systems.

[67] Scalable Voltage Control using Structure-Driven Hierarchical Deep Reinforcement Learning , 2021, ArXiv.

[68] Abhinav Verma,et al. Programmatically Interpretable Reinforcement Learning , 2018, ICML.

[69] Oliver Kroemer,et al. Active Reward Learning , 2014, Robotics: Science and Systems.

[70] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[71] Robert C. Qiu,et al. Deep reinforcement learning for power system: An overview , 2019, CSEE Journal of Power and Energy Systems.

[72] Zengyi Qin,et al. Density Constrained Reinforcement Learning , 2021, ICML.

[73] Fangxing Li,et al. Intelligent Multi-Microgrid Energy Management Based on Deep Neural Network and Model-Free Reinforcement Learning , 2020, IEEE Transactions on Smart Grid.

[74] Ruoyu Sun,et al. Optimization for deep learning: theory and algorithms , 2019, ArXiv.

[75] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[76] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[77] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[78] Hongbin Sun,et al. Family of energy management system for smart grid , 2012, 2012 3rd IEEE PES Innovative Smart Grid Technologies Europe (ISGT Europe).

[79] Minjie Zhang,et al. A Hybrid Multiagent Framework With Q-Learning for Power Grid Systems Restoration , 2011, IEEE Transactions on Power Systems.

[80] Wail Gueaieb,et al. Load frequency regulation for multi‐area power system using integral reinforcement learning , 2019, IET Generation, Transmission & Distribution.

[81] Mohammad Hassan Khooban,et al. A Novel Deep Reinforcement Learning Controller Based Type-II Fuzzy System: Frequency Regulation in Microgrids , 2021, IEEE Transactions on Emerging Topics in Computational Intelligence.

[82] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[83] M. Kosorok,et al. Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer , 2011, Biometrics.

[84] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[85] Albert Y. Zomaya,et al. Reinforcement learning in sustainable energy and electric systems: a survey , 2020, Annu. Rev. Control..

[86] Haibo He,et al. Constrained EV Charging Scheduling Based on Safe Deep Reinforcement Learning , 2020, IEEE Transactions on Smart Grid.

[87] Tao Jiang,et al. Multi-Agent Deep Reinforcement Learning for HVAC Control in Commercial Buildings , 2020, IEEE Transactions on Smart Grid.

[88] K. W. Chan,et al. Multi-Agent Correlated Equilibrium Q(λ) Learning for Coordinated Smart Generation Control of Interconnected Power Grids , 2015, IEEE Transactions on Power Systems.

[89] Takashi Hiyama,et al. Intelligent Automatic Generation Control , 2011 .

[90] Mohammad Shahidehpour,et al. Deep Reinforcement Learning for EV Charging Navigation by Coordinating Smart Grid and Intelligent Transportation System , 2020, IEEE Transactions on Smart Grid.

[91] Yuantao Gu,et al. Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction , 2022, IEEE Transactions on Information Theory.

[92] N.D. Hatziargyriou,et al. Reinforcement learning for reactive power control , 2004, IEEE Transactions on Power Systems.

[93] Hanchen Xu,et al. Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning , 2018, IEEE Transactions on Power Systems.

[94] Etienne Perot,et al. Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[95] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[96] Zhe Zhang,et al. Reinforcement-Learning-Based Intelligent Maximum Power Point Tracking Control for Wind Energy Conversion Systems , 2015, IEEE Transactions on Industrial Electronics.

[97] Jun Morimoto,et al. Robust Reinforcement Learning , 2005, Neural Computation.

[98] D. Apostolopoulou,et al. Load Frequency Control: A Deep Multi-Agent Reinforcement Learning Approach , 2020, 2020 IEEE Power & Energy Society General Meeting (PESGM).

[99] Na Li,et al. Optimal Distributed Feedback Voltage Control Under Limited Reactive Power , 2018, IEEE Transactions on Power Systems.

[100] Adam Wierman,et al. Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems , 2019, L4DC.

[101] Jie Li,et al. Energy-Efficient Thermal Comfort Control in Smart Buildings via Deep Reinforcement Learning , 2019, ArXiv.

[102] Jacob van der Woude,et al. A Reinforcement Learning Approach for Frequency Control of Inverted-Based Microgrids , 2019, IFAC-PapersOnLine.

[103] Mihaela van der Schaar,et al. Dynamic Pricing and Energy Consumption Scheduling With Reinforcement Learning , 2016, IEEE Transactions on Smart Grid.

[104] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[105] Ying Chen,et al. Evaluation of Reinforcement Learning-Based False Data Injection Attack to Automatic Voltage Control , 2019, IEEE Transactions on Smart Grid.

[106] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.

[107] Yan Xu,et al. A Multi-Agent Reinforcement Learning-Based Data-Driven Method for Home Energy Management , 2020, IEEE Transactions on Smart Grid.

[108] Antonio Liotta,et al. On-Line Building Energy Optimization Using Deep Reinforcement Learning , 2017, IEEE Transactions on Smart Grid.

[109] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.

[110] Qingyu Yang,et al. Defending Against Data Integrity Attacks in Smart Grid: A Deep Reinforcement Learning-Based Approach , 2019, IEEE Access.

[111] Wei Liu,et al. Stochastic Maintenance Schedules of Active Distribution Networks Based on Monte-Carlo Tree Search , 2020, IEEE Transactions on Power Systems.

[112] Yichuang Sun,et al. Demand Response Strategy Based on Reinforcement Learning and Fuzzy Reasoning for Home Energy Management , 2020, IEEE Access.

[113] Pierluigi Siano,et al. Big Data Issues in Smart Grids: A Survey , 2019, IEEE Systems Journal.

[114] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[115] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[116] Gábor Orosz,et al. End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[117] Bin Zhang,et al. Reinforcement Learning and Its Applications in Modern Power and Energy Systems: A Review , 2020, Journal of Modern Power Systems and Clean Energy.

[118] Bhiksha Raj,et al. On the Origin of Deep Learning , 2017, ArXiv.

[119] P. S. Nagendra Rao,et al. A reinforcement learning approach to automatic generation control , 2002 .

[120] Qian Ai,et al. Distributed Online Dispatch for Microgrids Using Hierarchical Reinforcement Learning Embedded With Operation Knowledge , 2023, IEEE Transactions on Power Systems.

[121] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[122] Tianshu Wei,et al. Deep reinforcement learning for building HVAC control , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[123] Wei Wang,et al. Safe Off-Policy Deep Reinforcement Learning Algorithm for Volt-VAR Control in Power Distribution Systems , 2020, IEEE Transactions on Smart Grid.

[124] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[125] A. R. Aoki,et al. A Reinforcement Learning Approach to Solve Service Restoration and Load Management Simultaneously for Distribution Networks , 2019, IEEE Access.

[126] Wenchuan Wu,et al. Bi-level Off-policy Reinforcement Learning for Volt/VAR Control Involving Continuous and Discrete Devices , 2021, ArXiv.

[127] Mevludin Glavic,et al. (Deep) Reinforcement learning for electric power system control and related problems: A short review and perspectives , 2019, Annu. Rev. Control..

[128] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[129] Xiangyu Zhang,et al. An Edge-Cloud Integrated Solution for Buildings Demand Response Using Reinforcement Learning , 2021, IEEE Transactions on Smart Grid.

[130] Frede Blaabjerg,et al. Deep Reinforcement Learning Based Approach for Optimal Power Flow of Distribution Networks Embedded with Renewable Energy and Storage Devices , 2021, Journal of Modern Power Systems and Clean Energy.

[131] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[132] Xianzhuo Sun,et al. Two-Stage Volt/Var Control in Active Distribution Networks With Multi-Agent Deep Reinforcement Learning Method , 2021, IEEE Transactions on Smart Grid.

[133] Qi Huang,et al. A Multi-Agent Deep Reinforcement Learning Based Voltage Regulation Using Coordinated PV Inverters , 2020, IEEE Transactions on Power Systems.

[134] Bhim Singh,et al. Q-Learning based Maximum Power Extraction for Wind Energy Conversion System With Variable Wind Speed , 2020, IEEE Transactions on Energy Conversion.

[135] Tamer Basar,et al. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[136] Kaveh Dehghanpour,et al. A Learning-based Power Management for Networked Microgrids Under Incomplete Information , 2019 .

[137] Haibo He,et al. Online Scheduling of a Residential Microgrid via Monte-Carlo Tree Search and a Learned Model , 2020, ArXiv.

[138] Hao Jan Liu,et al. Fast Local Voltage Control Under Limited Reactive Power: Optimality and Stability Analysis , 2015, IEEE Transactions on Power Systems.

[139] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.

[140] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[141] Goran Strbac,et al. Model-Free Real-Time Autonomous Control for a Residential Multi-Energy System Using Deep Reinforcement Learning , 2020, IEEE Transactions on Smart Grid.

[142] Qiang Yang,et al. Federated Reinforcement Learning , 2019, ArXiv.

[143] Francisco M. Gonzalez-Longatt,et al. Deep Reinforcement Learning-Based Controller for SOC Management of Multi-Electrical Energy Storage System , 2020, IEEE Transactions on Smart Grid.

[144] Na Li,et al. Distributed Automatic Load-Frequency Control with Optimality in Power Systems , 2018, 2018 IEEE Conference on Control Technology and Applications (CCTA).

[145] Nikos D. Hatziargyriou,et al. Distributed and Decentralized Voltage Control of Smart Distribution Networks: Models, Methods, and Future Research , 2017, IEEE Transactions on Smart Grid.

[146] Junjian Qi,et al. Droop-Free Distributed Control for AC Microgrids With Precisely Regulated Voltage Variance and Admissible Voltage Profile Guarantees , 2020, IEEE Transactions on Smart Grid.

[147] Goran Strbac,et al. Deep Reinforcement Learning for Strategic Bidding in Electricity Markets , 2020, IEEE Transactions on Smart Grid.

[148] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[149] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[150] Mingjian Cui,et al. Model-Free Emergency Frequency Control Based on Reinforcement Learning , 2021, IEEE Transactions on Industrial Informatics.

[151] Torsten Koller,et al. Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.

[152] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[153] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[154] Kim Peter Wabersich,et al. Linear Model Predictive Safety Certification for Learning-Based Control , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[155] Hongbin Sun,et al. Review of Challenges and Research Opportunities for Voltage Control in Smart Grids , 2019, IEEE Transactions on Power Systems.

[156] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..

[157] Pierluigi Siano,et al. Assessing the Use of Reinforcement Learning for Integrated Voltage/Frequency Control in AC Microgrids , 2020, Energies.

[158] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[159] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[160] H. Robbins. A Stochastic Approximation Method , 1951 .

[161] Wenlong Fu,et al. Model-based reinforcement learning: A survey , 2018 .

[162] John Salvatier,et al. Active Reinforcement Learning: Observing Rewards at a Cost , 2020, ArXiv.

[163] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..

[164] Na Li,et al. Online Residential Demand Response via Contextual Multi-Armed Bandits , 2021, IEEE Control Systems Letters.