Learning automata with changing number of actions

A reinforcement scheme that is based on the linear reward-inaction updating algorithm is presented for a learning automaton whose action set changes from instant to instant. A learning automaton using the algorithm is shown to be both absolutely expedient and ε-optimal. The simulation results verify the ε-optimality of the algorithm. The results can be extended to the design of general nonlinear absolutely expedient learning algorithms.