Asynchronous neurocomputing for optimal control and reinforcement learning with large state spaces

We consider two machine learning related problems, optimal control and reinforcement learning. We show that, even when their state space is very large (possibly infinite), natural algorithmic solutions can be implemented in an asynchronous neurocomputing way, that is by an assembly of interconnected simple neuron-like units which does not require any synchronization. From a neuroscience perspective, this work might help understanding how an asynchronous assembly of simple units can give rise to efficient control. From a computational point of view, such neurocomputing architectures can exploit their massively parallel structure and be significantly faster than standard sequential approaches. The contributions of this paper are the following: (1) We introduce a theoretically sound methodology for designing a whole class of asynchronous neurocomputing algorithms. (2) We build an original asynchronous neurocomputing architecture for optimal control in a small state space, then we show how to improve this architecture so that also solves the reinforcement learning problem. (3) Finally, we show how to extend this architecture to address the case where the state space is large (possibly infinite) by using an asynchronous neurocomputing adaptive approximation scheme. We illustrate this approximation scheme on two continuous space control problems.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Hervé Frezza-Buet Un modèle de cortex pour le comportement motivé d'un agent neuromimétique autonome , 1999 .

[3]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[4]  Yves Burnod,et al.  An adaptive neural network - the cerebral cortex , 1991 .

[5]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6]  Nicolas P. Rougier,et al.  Modèles de mémoires pour la navigation autonome , 2000 .

[7]  Bruno Scherrer Parallel asynchronous distributed computations of optimal control in large state space Markov Decision processes , 2003, ESANN.

[8]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[9]  Andrew W. Moore,et al.  Rates of Convergence for Variable Resolution Schemes in Optimal Control , 2000, ICML.

[10]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[11]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[12]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[13]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[14]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[15]  Bruno Scherrer Apprentissage de représentation et auto-organisation modulaire pour un agent autonome , 2003 .

[16]  Andrew G. Barto,et al.  Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.

[17]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[18]  Alexis Scheuer Planification de chemins à courbure continue pour robot mobile non-holonome , 1998 .

[19]  François Fleuret,et al.  DEA: An Architecture for Goal Planning and Classification , 2000, Neural Computation.