论文信息 - Asynchronous neurocomputing for optimal control and reinforcement learning with large state spaces

Asynchronous neurocomputing for optimal control and reinforcement learning with large state spaces

We consider two machine learning related problems, optimal control and reinforcement learning. We show that, even when their state space is very large (possibly infinite), natural algorithmic solutions can be implemented in an asynchronous neurocomputing way, that is by an assembly of interconnected simple neuron-like units which does not require any synchronization. From a neuroscience perspective, this work might help understanding how an asynchronous assembly of simple units can give rise to efficient control. From a computational point of view, such neurocomputing architectures can exploit their massively parallel structure and be significantly faster than standard sequential approaches. The contributions of this paper are the following: (1) We introduce a theoretically sound methodology for designing a whole class of asynchronous neurocomputing algorithms. (2) We build an original asynchronous neurocomputing architecture for optimal control in a small state space, then we show how to improve this architecture so that also solves the reinforcement learning problem. (3) Finally, we show how to extend this architecture to address the case where the state space is large (possibly infinite) by using an asynchronous neurocomputing adaptive approximation scheme. We illustrate this approximation scheme on two continuous space control problems.

Bruno Scherrer

[1] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[2] Hervé Frezza-Buet. Un modèle de cortex pour le comportement motivé d'un agent neuromimétique autonome , 1999 .

[3] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[4] Yves Burnod,et al. An adaptive neural network - the cerebral cortex , 1991 .

[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6] Nicolas P. Rougier,et al. Modèles de mémoires pour la navigation autonome , 2000 .

[7] Bruno Scherrer. Parallel asynchronous distributed computations of optimal control in large state space Markov Decision processes , 2003, ESANN.

[8] W. Pitts,et al. A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[9] Andrew W. Moore,et al. Rates of Convergence for Variable Resolution Schemes in Optimal Control , 2000, ICML.

[10] J. Knott. The organization of behavior: A neuropsychological theory , 1951 .

[11] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .