A Novel Dual Iterative $Q$-Learning Method for Optimal Battery Management in Smart Residential Environments

In this paper, a novel iterative Q-learning method called “dual iterative Q-learning algorithm” is developed to solve the optimal battery management and control problem in smart residential environments. In the developed algorithm, two iterations are introduced, which are internal and external iterations, where internal iteration minimizes the total cost of power loads in each period, and the external iteration makes the iterative Q-function converge to the optimum. Based on the dual iterative Q-learning algorithm, the convergence property of the iterative Q-learning method for the optimal battery management and control problem is proven for the first time, which guarantees that both the iterative Q-function and the iterative control law reach the optimum. Implementing the algorithm by neural networks, numerical results and comparisons are given to illustrate the performance of the developed algorithm.

[1]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[2]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[3]  Ken Nagasaka,et al.  Multiobjective Intelligent Energy Management for a Microgrid , 2013, IEEE Transactions on Industrial Electronics.

[4]  Francesco Piazza,et al.  Optimal Home Energy Management Under Dynamic Electrical and Thermal Constraints , 2013, IEEE Transactions on Industrial Informatics.

[5]  Federico Baronti,et al.  Online Adaptive Parameter Identification and State-of-Charge Coestimation for Lithium-Polymer Battery Cells , 2014, IEEE Transactions on Industrial Electronics.

[6]  Sarangapani Jagannathan,et al.  Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Long He,et al.  Stochastic Control for Smart Grid Users With Flexible Demand , 2013, IEEE Transactions on Smart Grid.

[8]  Derong Liu,et al.  A self-learning scheme for residential energy system control and management , 2013, Neural Computing and Applications.

[9]  R. Bellman Dynamic programming. , 1957, Science.

[10]  Ali Heydari,et al.  Finite-Horizon Control-Constrained Nonlinear Optimal Control Using Single Network Adaptive Critics , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Derong Liu,et al.  Stable iterative adaptive dynamic programming algorithm with approximation errors for discrete-time nonlinear systems , 2013, Neural Computing and Applications.

[12]  Guoqing Xu,et al.  Regulated Charging of Plug-in Hybrid Electric Vehicles for Minimizing Load Variance in Household Smart Microgrid , 2013, IEEE Transactions on Industrial Electronics.

[13]  Paul J. Webros A menu of designs for reinforcement learning over time , 1990 .

[14]  Bo Lincoln,et al.  Relaxing dynamic programming , 2006, IEEE Transactions on Automatic Control.

[15]  Robert M. Haralick,et al.  Feature normalization and likelihood-based similarity measures for image retrieval , 2001, Pattern Recognit. Lett..

[16]  Kamal Al-Haddad,et al.  A Comparative Study of Energy Management Schemes for a Fuel-Cell Hybrid Emergency Power System of More-Electric Aircraft , 2014, IEEE Transactions on Industrial Electronics.

[17]  Timothy Yau,et al.  Effects of Battery Storage Devices on Power System Dispatch , 1981, IEEE Transactions on Power Apparatus and Systems.

[18]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Jennie Si,et al.  Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[20]  Derong Liu,et al.  Adaptive Dynamic Programming Algorithm for Renewable Energy Scheduling and Battery Management , 2012, Cognitive Computation.

[21]  Juan C. Vasquez,et al.  State-of-Charge Balance Using Adaptive Droop Control for Distributed Energy Storage Systems in DC Microgrid Applications , 2014, IEEE Transactions on Industrial Electronics.

[22]  Qinglai Wei,et al.  A Novel Iterative $\theta $-Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Automation Science and Engineering.

[23]  Derong Liu,et al.  Data-Driven Neuro-Optimal Temperature Control of Water–Gas Shift Reaction Using Stable Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Industrial Electronics.

[24]  Frank L. Lewis,et al.  Lyapunov, Adaptive, and Optimal Design Techniques for Cooperative Systems on Directed Communication Graphs , 2012, IEEE Transactions on Industrial Electronics.

[25]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[26]  Tsung-Ying Lee Operating Schedule of Battery Energy Storage System in a Time-of-Use Rate Industrial User With Wind Turbine Generators: A Multipass Iteration Particle Swarm Optimization Approach , 2007, IEEE Transactions on Energy Conversion.

[27]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Derong Liu,et al.  Adaptive Dynamic Programming for Optimal Tracking Control of Unknown Nonlinear Systems With Application to Coal Gasification , 2014, IEEE Transactions on Automation Science and Engineering.

[29]  Derong Liu,et al.  Action dependent heuristic dynamic programming for home energy resource scheduling , 2013 .

[30]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[31]  Sheldon S. Williamson,et al.  Power-Electronics-Based Solutions for Plug-in Hybrid Electric Vehicle Energy Storage and Management Systems , 2010, IEEE Transactions on Industrial Electronics.

[32]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[33]  Piotr Lezynski,et al.  Time-Domain-Based Assessment of Data Transmission Error Probability in Smart Grids With Electromagnetic Interference , 2014, IEEE Transactions on Industrial Electronics.