Design and Real-Time Implementation of Optimal Power System Wide-Area System-Centric Controller Based on Temporal Difference Learning

In this paper, a novel framework for designing and implementing a coordinated wide-area controller architecture for improved power system dynamic stability is presented and tested. The algorithm is an optimal wide-area system-centric controller and observer based on a hybrid reinforcement learning and temporal difference framework. It allows the system to deal with major concerns of wide-area monitoring problem: delays in signal transmission, the uncertainty of the communication network, and data traffic. The main advantage of this design is its ability to learn from the past using eligibility traces and predict the optimal trajectory of cost function through temporal difference method. The control algorithm is evolved from adaptive critic design (ACD) and performed online at a finite horizon through backward and forward view. The ACD controller's training and testing are implemented on the Innovative Integration Picolo card integrated to TMS320C28335 processor. Results on a real experimental test bed using a real power system feeder shows that this architecture provides better stability compared with conventional schemes.

[1]  Michael Kearns,et al.  Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.

[2]  Richard S. Sutton,et al.  Learning and Sequential Decision Making , 1989 .

[3]  Michael T. Rosenstein,et al.  Supervised Actor‐Critic Reinforcement Learning , 2012 .

[4]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[5]  Aranya Chakrabortty,et al.  Topology identification for dynamic equivalent models of large power system networks , 2013, 2013 American Control Conference.

[6]  Ganesh K. Venayagamoorthy,et al.  Real-time implementation of a measurement-based adaptive wide-area control system considering communication delays , 2008 .

[7]  Sukumar Kamalasadan,et al.  System-centric control architecture for wide area monitoring and control of power system , 2013, 2013 IEEE PES Innovative Smart Grid Technologies Conference (ISGT).

[8]  Richard S. Sutton,et al.  True online TD(λ) , 2014, ICML 2014.

[9]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[10]  Paul J. Werbos,et al.  Neural networks and the experience and cultivation of mind , 2012, Neural Networks.

[11]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[12]  Wenxin Liu,et al.  Slow Coherency and Angle Modulated Particle Swarm Optimization Based Islanding of Large Scale Power Systems , 2007, 2007 International Joint Conference on Neural Networks.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  O.P. Malik,et al.  Generalized neuron-based adaptive PSS for multimachine environment , 2005, IEEE Transactions on Power Systems.

[15]  R.G. Harley,et al.  Optimal Wide Area Controller and State Predictor for a Power System , 2007, IEEE Transactions on Power Systems.

[16]  Shalabh Bhatnagar,et al.  Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[17]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[18]  Paul J. Werbos,et al.  Computational Intelligence for the Smart Grid-History, Challenges, and Opportunities , 2011, IEEE Computational Intelligence Magazine.

[19]  Sukumar Kamalasadan,et al.  Intelligent multi-agent framework for power system control and protection , 2011, 2011 IEEE Power and Energy Society General Meeting.

[20]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[21]  Peng Zhang,et al.  Adaptive wide-area damping control scheme with stochastic subspace identification and signal time delay compensation , 2012 .

[22]  Roland E. Suri,et al.  Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.