论文信息 - Living in a partially structured environment: How to bypass the limitations of classical reinforcement techniques

Living in a partially structured environment: How to bypass the limitations of classical reinforcement techniques

Abstract In this paper, we propose an unsupervised neural network allowing a robot to learn sensory-motor associations with a delayed reward. The robot task is to learn the “meaning” of pictograms in order to “survive” in a maze. First, we introduce a new neural conditioning rule probabilistic conditioning rule (PCR) allowing us to test hypotheses (associations between visual categories and movements) during a given time span. Second, we describe a real maze experiment with our mobile robot. We propose a neural architecture overcoming the difficulty to build visual categories dynamically while associating them to movements. Third, we propose to use our algorithm on a simulation in order to test it exhaustively. We give the results for different kinds of mazes and we compare our system to an adapted version of the Q-learning algorithm. Finally, we conclude by showing the limitations of approaches that do not take into account the intrinsic complexity of a reasoning based on image recognition.

[1] Frédéric Alexandre,et al. The cortical column: A new processing unit for multilayered networks , 1991, Neural Networks.

[2] Philippe Gaussier,et al. PerAc: A neural architecture to control artificial animals , 1995, Robotics Auton. Syst..

[3] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .

[4] J. Changeux. Neuronal man : the biology of mind , 1985 .

[5] G. Edelman. Neural Darwinism: The Theory Of Neuronal Group Selection , 1989 .

[6] Francesco Mondada,et al. Evolution of neural control structures: some experiments on mobile robots , 1995, Robotics Auton. Syst..

[7] John Stewart. The implications for understanding high-level cognition of a grounding in elementary adaptive systems , 1995, Robotics Auton. Syst..

[8] M. Meng,et al. Mobile robot navigation using neural networks and nonmetrical environmental models , 1993, IEEE Control Systems.

[9] E. Tolman. Cognitive maps in rats and men. , 1948, Psychological review.

[10] Richard Alan Peters,et al. Object detection in indoor scenes using log-polar mapping , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[11] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[12] Sebastian Thrun,et al. Lifelong robot learning , 1993, Robotics Auton. Syst..

[13] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .

[14] Geoffrey E. Hinton,et al. The appeal of parallel distributed processing , 1986 .

[15] P. Gaussier,et al. Navigating with an animal brain: a neural network for landmark identification and navigation , 1994, Proceedings of the Intelligent Vehicles '94 Symposium.

[16] Jean-Arcady Meyer,et al. Evolution and development of control architectures in animats , 1995, Robotics Auton. Syst..

[17] P. Schönemann. On artificial intelligence , 1985, Behavioral and Brain Sciences.

[18] Maja J. Mataric,et al. Issues and approaches in the design of collective autonomous agents , 1995, Robotics Auton. Syst..

[19] Rolf Pfeifer,et al. Sensory - motor coordination: The metaphor and beyond , 1997, Robotics Auton. Syst..

[20] Teuvo Kohonen,et al. Self-Organization and Associative Memory , 1988 .

[21] R. Hecht-Nielsen. Counterpropagation networks. , 1987, Applied optics.

[22] M. Gabriel,et al. Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[23] Luc Steels,et al. A selectionist mechanism for autonomous behavior acquisition , 1997, Robotics Auton. Syst..

[24] M. Matarić. Learning to Behave Socially , 1994 .

[25] M. Levine,et al. Hypothesis theory and nonlearning despite ideal S-R-reinforcement contingencies. , 1971 .

[26] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[27] D. A. Lieberman,et al. Learning: Behavior and cognition , 1990 .

[28] Olaf Sporns,et al. Synthetic neural modeling: the 'Darwin' series of recognition automata , 1990, Proc. IEEE.

[29] José del R. Millán,et al. Learning efficient reactive behavioral sequences from basic reflexes in a goal-directed autonomous robot , 1994 .

[30] Gregor Schöner,et al. Dynamics of behavior: Theory and applications for autonomous robot architectures , 1995, Robotics Auton. Syst..

[31] 永福智志. The Organization of Learning , 2005, Journal of Cognitive Neuroscience.

[32] Philippe Gaussier,et al. Avoiding the world model trap: An acting robot does not need to be so smart! , 1994 .

[33] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[34] Raja Chatila,et al. Deliberation and reactivity in autonomous mobile robots , 1995, Robotics Auton. Syst..

[35] James S. Albus,et al. Outline for a theory of intelligence , 1991, IEEE Trans. Syst. Man Cybern..

[36] Jean-Louis Deneubourg,et al. The dynamics of collective sorting robot-like ants and ant-like robots , 1991 .

[37] Philippe Gaussier,et al. Neural networks for complex scene recognition: simulation of a visual system with several cortical areas , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[38] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.

[39] P. Gaussier,et al. Why topological maps are useful for learning in an autonomous agent , 1994, Proceedings of PerAc '94. From Perception to Action.

[40] H. Maturana,et al. Autopoiesis and Cognition : The Realization of the Living (Boston Studies in the Philosophy of Scie , 1980 .

[41] Allen M. Waxman,et al. Visual learning, adaptive expectations, and behavioral conditioning of the mobile robot MAVIN , 1991, Neural Networks.

[42] Ulrich Nehmzow,et al. Robot Navigation by Light , 1993 .

[43] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[44] K. Spence. Behavior Theory and Conditioning , 1978 .

[45] I. Krechevsky,et al. "Hypotheses" versus "chance" in the pre-solution period in sensory discrimination-learning and the Genesis of "Hypotheses" in rats , 1932 .

[46] E. Capaldi,et al. The organization of behavior. , 1992, Journal of applied behavior analysis.

[47] M. Levine,et al. A model of hypothesis behavior in discrimination learning set. , 1959, Psychological review.

[48] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[49] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[50] James S. Morgan,et al. A Hierarchical Network of Control Systems that Learn: Modeling Nervous System Function During Classical and Instrumental Conditioning , 1993, Adapt. Behav..

[51] T. Trabasso. Stimulus emphasis and all-or-none learning in concept identification. , 1963, Journal of experimental psychology.

[52] Carme Torras. Robot adaptivity , 1995, Robotics Auton. Syst..