Conditions for the convergence of one-layer networks under reinforcement learning

An extension to the convergence theorem for single neurons learning under the AR-P algorithm is proved. The extension shows that if the conditions of the single-neuron theorem are satisfied and if the environment satisfies one of two sufficient conditions, the weights in an arbitrarily large one-layer network will converge with probability one to values with which the network correctly classifies the training input set. One condition requires that for all output vectors, the probability of reinforcement being one (success) has one of two values: the output vectors having at least lk correct elements have the higher probability, whereas the output vectors having less than lk correct elements have the lower probability. The alternative condition requires that the reinforcement have a higher probability of being one for output vectors having a higher number of correct elements. The extension and its proof are significant because they further the understanding of the factors affecting the convergence of multilayer networks under reinforcement learning