RESEARCH ON A REINFORCEMENT LEARNING ALGORITHM BASED ON NEURAL NETWORK

BP neural network has been used in nonlinear system controller widely. But as a supervised training algorithm, it requires the input-output pairs to be trained. But in some systems such input-output pairs cannot be received under the optimal control policy. On the other hand, reinforcement learning (RL) learns behavior through trial-and-error interaction with a dynamic environment. It is unsupervised and on-line. This paper provides the RBP model which adapts the BP network to be used in RL. The main idea of RBP is: RL learns optimal policy from the environment and stores the policy into the network. Instead of updating weights instantly, network weights are updated in batch mode periodically. A simple example is used to illustrate the validity of the algorithm.