论文信息 - Forward and Bidirectional Planning Based on Reinforcement Learning and Neural Networks in a Simulated Robot

Forward and Bidirectional Planning Based on Reinforcement Learning and Neural Networks in a Simulated Robot

Building intelligent systems that are capable of learning, acting reactively and planning actions before their execution is a major goal of artificial intelligence. This paper presents two reactive and planning systems that contain important novelties with respect to previous neural-network planners and reinforcement-learning based planners: (a) the introduction of a new component (”matcher”) allows both planners to execute genuine taskable planning (while previous reinforcement-learning based models have used planning only for speeding up learning); (b) the planners show for the first time that trained neural-network models of the world can generate long prediction chains that have an interesting robustness with regards to noise; (c) two novel algorithms that generate chains of predictions in order to plan, and control the flows of information between the systems’ different neural components, are presented; (d) one of the planners uses backward ”predictions” to exploit the knowledge of the pursued goal; (e) the two systems presented nicely integrate reactive behavior and planning on the basis of a measure of ”confidence” in action. The soundness and potentialities of the two reactive and planning systems are tested and compared with a simulated robot engaged in a stochastic path-finding task. The paper also presents an extensive literature review on the relevant issues.

Gianluca Baldassarre | G. Baldassarre

[1] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[2] James A. Hendler,et al. Readings in Planning , 1994 .

[3] Gianluca Baldassarre,et al. Planning with neural networks and reinforcement learning , 2001 .

[4] Richard E. Korf,et al. Optimal path-finding algorithms* , 1988 .

[5] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[6] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[7] Jürgen Schmidhuber,et al. Artificial curiosity based on discovering novel algorithmic predictability through coevolution , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[8] C. E. SHANNON,et al. A mathematical theory of communication , 1948, MOCO.

[9] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[10] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[11] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.