Design and Real-Time Implementation of Optimal Power System Wide-Area System-Centric Controller Based on Temporal Difference Learning

In this paper, a novel framework for designing and implementing a coordinated wide-area controller architecture for improved power system dynamic stability is presented and tested. The algorithm is an optimal wide-area system-centric controller and observer based on a hybrid reinforcement learning and temporal difference framework. It allows the system to deal with major concerns of wide-area monitoring problem: delays in signal transmission, the uncertainty of the communication network, and data traffic. The main advantage of this design is its ability to learn from the past using eligibility traces and predict the optimal trajectory of cost function through temporal difference method. The control algorithm is evolved from adaptive critic design (ACD) and performed online at a finite horizon through backward and forward view. The ACD controller's training and testing are implemented on the Innovative Integration Picolo card integrated to TMS320C28335 processor. Results on a real experimental test bed using a real power system feeder shows that this architecture provides better stability compared with conventional schemes.

[1]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2]  A. Barto,et al.  1 Supervised Actor-Critic Reinforcement Learning , 2007 .

[3]  Paul J. Werbos,et al.  Neural networks and the experience and cultivation of mind , 2012, Neural Networks.

[4]  Sukumar Kamalasadan,et al.  System-centric control architecture for wide area monitoring and control of power system , 2013, 2013 IEEE PES Innovative Smart Grid Technologies Conference (ISGT).

[5]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[6]  Ganesh K. Venayagamoorthy,et al.  Real-time implementation of a measurement-based adaptive wide-area control system considering communication delays , 2008 .

[7]  George G. Lendaris,et al.  A New Hybrid Critic-Training Method for Approximate Dynamic Programming , 2000 .

[8]  Shalabh Bhatnagar,et al.  Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[9]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[10]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[11]  O.P. Malik,et al.  Generalized neuron-based adaptive PSS for multimachine environment , 2005, IEEE Transactions on Power Systems.

[12]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[13]  Michael T. Rosenstein,et al.  Supervised Actor‐Critic Reinforcement Learning , 2012 .

[14]  Michael Kearns,et al.  Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.

[15]  Paul J. Werbos,et al.  Computational Intelligence for the Smart Grid-History, Challenges, and Opportunities , 2011, IEEE Computational Intelligence Magazine.

[16]  Sukumar Kamalasadan,et al.  Intelligent multi-agent framework for power system control and protection , 2011, 2011 IEEE Power and Energy Society General Meeting.

[17]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[18]  Aranya Chakrabortty,et al.  Topology identification for dynamic equivalent models of large power system networks , 2013, 2013 American Control Conference.

[19]  R.G. Harley,et al.  Optimal Wide Area Controller and State Predictor for a Power System , 2007, IEEE Transactions on Power Systems.

[20]  Subhashish Bhattacharya,et al.  Optimal Control of Battery Energy Storage for Wind Farm Dispatching , 2010, IEEE Transactions on Energy Conversion.

[21]  Richard S. Sutton,et al.  True online TD(λ) , 2014, ICML 2014.

[22]  Peng Zhang,et al.  Adaptive wide-area damping control scheme with stochastic subspace identification and signal time delay compensation , 2012 .

[23]  Roland E. Suri,et al.  Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.

[24]  Wenxin Liu,et al.  Slow Coherency and Angle Modulated Particle Swarm Optimization Based Islanding of Large Scale Power Systems , 2007, 2007 International Joint Conference on Neural Networks.

[25]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.