Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control.

In this paper we explain how to design intelligent agents able to process the information acquired from interaction with a system to learn a good control policy and show how the methodology can be applied to control some devices aimed to damp electrical power oscillations. The control problem is formalized as a discrete-time optimal control problem and the information acquired from interaction with the system is a set of samples, where each sample is composed of four elements: a state, the action taken while being in this state, the instantaneous reward observed and the successor state of the system. To process this information we consider reinforcement learning algorithms that determine an approximation of the so-called Q-function by mimicking the behavior of the value iteration algorithm. Simulations are first carried on a benchmark power system modeled with two state variables. Then we present a more complex case study on a four-machine power system where the reinforcement learning algorithm controls a Thyristor Controlled Series Capacitor (TCSC) aimed to damp power system oscillations.

[1]  R. Bellman,et al.  Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1963 .

[2]  C. Watkins Learning from delayed rewards , 1989 .

[3]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[4]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[5]  C. Atkeson,et al.  Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .

[6]  P. Kundur,et al.  Power system stability and control , 1994 .

[7]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[8]  John Rust Using Randomization to Break the Curse of Dimensionality , 1997 .

[9]  M. Pavella,et al.  Transient stability of power systems: Theory and practice , 1994 .

[10]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[11]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[12]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[13]  Graham Rogers,et al.  Power System Oscillations , 1999 .

[14]  Geoffrey J. Gordon,et al.  Approximate solutions to markov decision processes , 1999 .

[15]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[16]  Louis Wehenkel,et al.  Emergency control and its strategies , 1999 .

[17]  Xiaoxin Zhou,et al.  Learning-coordinate fuzzy logic control of dynamic quadrature boosters in multi-machine power systems , 1999 .

[18]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[19]  O. Hernández-Lerma,et al.  Further topics on discrete-time Markov control processes , 1999 .

[20]  Laszlo Gyugyi,et al.  Understanding FACTS: Concepts and Technology of Flexible AC Transmission Systems , 1999 .

[21]  Henry Wu,et al.  Reinforcement Learning For The Control Of Large-Scale Power Systems , 2000 .

[22]  Peter W. Glynn,et al.  Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice , 2000, NIPS.

[23]  Chen-Ching Liu,et al.  The strategic power infrastructure defense (SPID) system. A conceptual design , 2000, IEEE Control Systems.

[24]  Carson W. Taylor Response-Based , Feedforward Wide-Area Control , 2000 .

[25]  Mehrdad Ghandhari,et al.  Control Lyapunov Functions : A Control Strategy for Damping of Power Oscillations in Large Power Systems , 2000 .

[26]  Pierre Geurts,et al.  Contributions to decision tree induction: bias/variance tradeoff and time series classification , 2002 .

[27]  Louis Wehenkel,et al.  FACTS devices controlled by means of reinforcement learning algorithms , 2002 .

[28]  L. Wehenkel,et al.  EXaMINE-experimentation of a monitoring and control system for managing vulnerabilities of the European infrastructure for electrical power exchange , 2002, IEEE Power Engineering Society Summer Meeting,.

[29]  C. C. Liu,et al.  Adaptation in Load Shedding under Vulnerable Operation Conditions , 2002, IEEE Power Engineering Review.

[30]  Damien Ernst,et al.  Near Optimal Closed-Loop Control. Application to Electric Power Systems , 2003 .

[31]  Pierre Geurts,et al.  Iteratively Extending Time Horizon Reinforcement Learning , 2003, ECML.

[32]  D. Ernst,et al.  Power systems stability control: reinforcement learning framework , 2004, IEEE Transactions on Power Systems.

[33]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[34]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[35]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[36]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[37]  Louis Wehenkel,et al.  NEW DEVELOPMENTS IN THE APPLICATION OF AUTOMATIC LEARNING TO POWER SYSTEM CONTROL , 2005 .

[38]  D. Ernst,et al.  Combining a stability and a performance-oriented control in power systems , 2005, IEEE Transactions on Power Systems.

[39]  D. Ernst Selecting concise sets of samples for a reinforcement learning agent , 2005 .

[40]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.