Living in a partially structured environment: How to bypass the limitations of classical reinforcement techniques

Abstract In this paper, we propose an unsupervised neural network allowing a robot to learn sensory-motor associations with a delayed reward. The robot task is to learn the “meaning” of pictograms in order to “survive” in a maze. First, we introduce a new neural conditioning rule probabilistic conditioning rule (PCR) allowing us to test hypotheses (associations between visual categories and movements) during a given time span. Second, we describe a real maze experiment with our mobile robot. We propose a neural architecture overcoming the difficulty to build visual categories dynamically while associating them to movements. Third, we propose to use our algorithm on a simulation in order to test it exhaustively. We give the results for different kinds of mazes and we compare our system to an adapted version of the Q-learning algorithm. Finally, we conclude by showing the limitations of approaches that do not take into account the intrinsic complexity of a reasoning based on image recognition.

[1]  Frédéric Alexandre,et al.  The cortical column: A new processing unit for multilayered networks , 1991, Neural Networks.

[2]  Philippe Gaussier,et al.  PerAc: A neural architecture to control artificial animals , 1995, Robotics Auton. Syst..

[3]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[4]  J. Changeux Neuronal man : the biology of mind , 1985 .

[5]  G. Edelman Neural Darwinism: The Theory Of Neuronal Group Selection , 1989 .

[6]  Francesco Mondada,et al.  Evolution of neural control structures: some experiments on mobile robots , 1995, Robotics Auton. Syst..

[7]  John Stewart The implications for understanding high-level cognition of a grounding in elementary adaptive systems , 1995, Robotics Auton. Syst..

[8]  M. Meng,et al.  Mobile robot navigation using neural networks and nonmetrical environmental models , 1993, IEEE Control Systems.

[9]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[10]  Richard Alan Peters,et al.  Object detection in indoor scenes using log-polar mapping , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[11]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[12]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[13]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[14]  Geoffrey E. Hinton,et al.  The appeal of parallel distributed processing , 1986 .

[15]  P. Gaussier,et al.  Navigating with an animal brain: a neural network for landmark identification and navigation , 1994, Proceedings of the Intelligent Vehicles '94 Symposium.

[16]  Jean-Arcady Meyer,et al.  Evolution and development of control architectures in animats , 1995, Robotics Auton. Syst..

[17]  P. Schönemann On artificial intelligence , 1985, Behavioral and Brain Sciences.

[18]  Maja J. Mataric,et al.  Issues and approaches in the design of collective autonomous agents , 1995, Robotics Auton. Syst..

[19]  Rolf Pfeifer,et al.  Sensory - motor coordination: The metaphor and beyond , 1997, Robotics Auton. Syst..

[20]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[21]  R. Hecht-Nielsen Counterpropagation networks. , 1987, Applied optics.

[22]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[23]  Luc Steels,et al.  A selectionist mechanism for autonomous behavior acquisition , 1997, Robotics Auton. Syst..

[24]  M. Matarić Learning to Behave Socially , 1994 .

[25]  M. Levine,et al.  Hypothesis theory and nonlearning despite ideal S-R-reinforcement contingencies. , 1971 .

[26]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[27]  D. A. Lieberman,et al.  Learning: Behavior and cognition , 1990 .

[28]  Olaf Sporns,et al.  Synthetic neural modeling: the 'Darwin' series of recognition automata , 1990, Proc. IEEE.

[29]  José del R. Millán,et al.  Learning efficient reactive behavioral sequences from basic reflexes in a goal-directed autonomous robot , 1994 .

[30]  Gregor Schöner,et al.  Dynamics of behavior: Theory and applications for autonomous robot architectures , 1995, Robotics Auton. Syst..

[31]  永福 智志 The Organization of Learning , 2005, Journal of Cognitive Neuroscience.

[32]  Philippe Gaussier,et al.  Avoiding the world model trap: An acting robot does not need to be so smart! , 1994 .

[33]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[34]  Raja Chatila,et al.  Deliberation and reactivity in autonomous mobile robots , 1995, Robotics Auton. Syst..

[35]  James S. Albus,et al.  Outline for a theory of intelligence , 1991, IEEE Trans. Syst. Man Cybern..

[36]  Jean-Louis Deneubourg,et al.  The dynamics of collective sorting robot-like ants and ant-like robots , 1991 .

[37]  Philippe Gaussier,et al.  Neural networks for complex scene recognition: simulation of a visual system with several cortical areas , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[38]  Steven D. Whitehead,et al.  Complexity and Cooperation in Q-Learning , 1991, ML.

[39]  P. Gaussier,et al.  Why topological maps are useful for learning in an autonomous agent , 1994, Proceedings of PerAc '94. From Perception to Action.

[40]  H. Maturana,et al.  Autopoiesis and Cognition : The Realization of the Living (Boston Studies in the Philosophy of Scie , 1980 .

[41]  Allen M. Waxman,et al.  Visual learning, adaptive expectations, and behavioral conditioning of the mobile robot MAVIN , 1991, Neural Networks.

[42]  Ulrich Nehmzow,et al.  Robot Navigation by Light , 1993 .

[43]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[44]  K. Spence Behavior Theory and Conditioning , 1978 .

[45]  I. Krechevsky,et al.  "Hypotheses" versus "chance" in the pre-solution period in sensory discrimination-learning and the Genesis of "Hypotheses" in rats , 1932 .

[46]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[47]  M. Levine,et al.  A model of hypothesis behavior in discrimination learning set. , 1959, Psychological review.

[48]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[49]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[50]  James S. Morgan,et al.  A Hierarchical Network of Control Systems that Learn: Modeling Nervous System Function During Classical and Instrumental Conditioning , 1993, Adapt. Behav..

[51]  T. Trabasso Stimulus emphasis and all-or-none learning in concept identification. , 1963, Journal of experimental psychology.

[52]  Carme Torras Robot adaptivity , 1995, Robotics Auton. Syst..