Forward and Bidirectional Planning Based on Reinforcement Learning and Neural Networks in a Simulated Robot

Building intelligent systems that are capable of learning, acting reactively and planning actions before their execution is a major goal of artificial intelligence. This paper presents two reactive and planning systems that contain important novelties with respect to previous neural-network planners and reinforcement-learning based planners: (a) the introduction of a new component (”matcher”) allows both planners to execute genuine taskable planning (while previous reinforcement-learning based models have used planning only for speeding up learning); (b) the planners show for the first time that trained neural-network models of the world can generate long prediction chains that have an interesting robustness with regards to noise; (c) two novel algorithms that generate chains of predictions in order to plan, and control the flows of information between the systems’ different neural components, are presented; (d) one of the planners uses backward ”predictions” to exploit the knowledge of the pursued goal; (e) the two systems presented nicely integrate reactive behavior and planning on the basis of a measure of ”confidence” in action. The soundness and potentialities of the two reactive and planning systems are tested and compared with a simulated robot engaged in a stochastic path-finding task. The paper also presents an extensive literature review on the relevant issues.

[1]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[2]  James A. Hendler,et al.  Readings in Planning , 1994 .

[3]  Gianluca Baldassarre,et al.  Planning with neural networks and reinforcement learning , 2001 .

[4]  Richard E. Korf,et al.  Optimal path-finding algorithms* , 1988 .

[5]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[6]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[7]  Jürgen Schmidhuber,et al.  Artificial curiosity based on discovering novel algorithmic predictability through coevolution , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[8]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[9]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[10]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[11]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[12]  Sven Koenig 'From Animals to Animats 5': Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior , 2000, Artificial Life.

[13]  Jun Tani,et al.  Model-based learning for mobile robot navigation from the dynamical systems perspective , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[14]  T. Kohonen Self-Organized Formation of Correct Feature Maps , 1982 .

[15]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[16]  Vipin Kumar,et al.  Search in Artificial Intelligence , 1988, Symbolic Computation.

[17]  Jeffrey L. Elman,et al.  Learning and Evolution in Neural Networks , 1994, Adapt. Behav..

[18]  Richard Dearden,et al.  Structured Prioritised Sweeping , 2001, ICML.

[19]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[20]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[21]  Dario Floreano,et al.  From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior , 2000, Journal of Cognitive Neuroscience.

[22]  Ronald C. Arkin,et al.  An Behavior-based Robotics , 1998 .

[23]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[24]  Long Lin,et al.  Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[27]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[28]  Gianluca Baldassarre,et al.  A modular neural-network model of the basal ganglia’s role in learning and selecting motor behaviours , 2002, Cognitive Systems Research.

[29]  Sebastian Thrun,et al.  Planning with an Adaptive World Model , 1990, NIPS.

[30]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[31]  Jürgen Schmidhuber,et al.  Planning simple trajectories using neural subgoal generators , 1993 .

[32]  C. Atkeson,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[33]  R. Dearden Structured Prioritized Sweeping , 2022 .

[34]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[35]  Stefano Nolfi,et al.  Learning to perceive the world as articulated: an approach for hierarchical learning in sensory-motor systems , 1998, Neural Networks.