Une double approche modulaire de l'apprentissage par renforcement pour des agents intelligents adaptatifs. (A Twofold Modular Approach of Reinforcement Learning for Adaptive Intelligent Agents)

Cette these s'est interessee a deux domaines de l'intelligence artificielle : d'une part l'apprentissage par renforcement (A/R), et d'autre part les systemes multi-agents (SMA). Le premier permet de concevoir des agents (entites intelligentes) en se basant sur un signal de renforcement qui recompense les decisions menant au but fixe, alors que le second concerne l'intelligence qui peut venir de l'interaction d'un groupe d'entites (dans la perspective que le tout soit plus que la somme de ses parties). Chacun de ces deux outils souffre de diverses difficultes d'emploi. Le travail que nous avons mene a permis de montrer comment chacun des deux outils peut servir a l'autre pour repondre a certains de ces problemes. On a ainsi concu les agents d'un SMA par A/R, et organise l'architecture d'un agent apprenant par renforcement sous la forme d'un SMA. Ces deux outils se sont averes tres complementaires, et notre approche globale d'une conception “progressive” a prouve son efficacite.

[1]  Olivier Buffet,et al.  Learning to weigh basic behaviors in scalable agents , 2002, AAMAS '02.

[2]  Sridhar Mahadevan An Average-Reward Reinforcement Learning Algorithm for Computing Bias-Optimal Policies , 1996, AAAI/IAAI, Vol. 1.

[3]  Barbara Hayes-Roth,et al.  A Blackboard Architecture for Control , 1985, Artif. Intell..

[4]  Leon A. Petrosyan,et al.  Game Theory (Second Edition) , 1996 .

[5]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6]  Alan H. Bond,et al.  Readings in Distributed Artificial Intelligence , 1988 .

[7]  Nicholas R. Jennings,et al.  A Roadmap of Agent Research and Development , 2004, Autonomous Agents and Multi-Agent Systems.

[8]  Richard W. Prager,et al.  A Modular Q-Learning Architecture for Manipulator Task Decomposition , 1994, ICML.

[9]  Master Gardener,et al.  Mathematical games: the fantastic combinations of john conway's new solitaire game "life , 1970 .

[10]  John Haugeland,et al.  Artificial intelligence - the very idea , 1987 .

[11]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[12]  Olivier Buffet,et al.  Incremental reinforcement learning for designing multi-agent systems , 2001, AGENTS '01.

[13]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .

[14]  Rolf Pfeifer,et al.  On the role of morphology and materials in adaptive behavior , 2000 .

[15]  Kagan Tumer,et al.  Collective Intelligence and Braess' Paradox , 2000, AAAI/IAAI.

[16]  Olivier Buffet,et al.  Apprentissage par renforcement pour la conception de systèmes multi-agents réactifs , 2003, Tech. Sci. Informatiques.

[17]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[19]  John Loch,et al.  Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[20]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[21]  Minoru Asada,et al.  Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[22]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[23]  Drew McDermott,et al.  Introduction to artificial intelligence , 1986, Addison-Wesley series in computer science.

[24]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[25]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[26]  P.-P. Grasse La reconstruction du nid et les coordinations interindividuelles chezBellicositermes natalensis etCubitermes sp. la théorie de la stigmergie: Essai d'interprétation du comportement des termites constructeurs , 1959, Insectes Sociaux.

[27]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[28]  Bruno Scherrer Apprentissage de représentation et auto-organisation modulaire pour un agent autonome , 2003 .

[29]  Mauro Birattari,et al.  Toward the Formal Foundation of Ant Programming , 2002, Ant Algorithms.

[30]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[31]  Sandra L. Berger Massachusetts , 1896, The Journal of comparative medicine and veterinary archives.

[32]  Gerhard Weiß,et al.  Adaptation and Learning in Multi-Agent Systems: Some Remarks and a Bibliography , 1995, Adaption and Learning in Multi-Agent Systems.

[33]  Mark Humphreys,et al.  Action selection methods using reinforcement learning , 1997 .

[34]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[35]  Bruce Blumberg,et al.  Action-selection in hamsterdam: lessons from ethology , 1994 .

[36]  Sandra Clara Gadanho,et al.  Asynchronous learning by emotions and cognition , 2002 .

[37]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[38]  Peter L. Bartlett,et al.  Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[39]  Guillaume J. Laurent,et al.  Synthèse de comportements par apprentissages par renforcement parallèles : application à la commande d'un micromanipulateur plan , 2002 .

[40]  L. Shapley SOME TOPICS IN TWO-PERSON GAMES , 1963 .

[41]  Arnaud Dury Modélisation des interactions dans les systèmes multi-agents , 2000 .

[42]  Bruno Bettelheim Psychanalyse des contes de fées , 1976 .

[43]  L. Baird Reinforcement Learning Through Gradient Descent , 1999 .

[44]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[45]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[46]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[47]  John K. Slaney,et al.  Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards , 2002, UAI.

[48]  Jörg P. Müller,et al.  The agent architecture InteRRaP : concept and application , 1993 .

[49]  G. Di Caro,et al.  Ant colony optimization: a new meta-heuristic , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[50]  B. Skinner,et al.  Science and human behavior , 1953 .

[51]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[52]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[53]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[54]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[55]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[56]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[57]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[58]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[59]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[60]  Alain Dutech,et al.  Apprentissage par renforcement pour les processus décisionnels de Markov partiellement observés Apprendre une extension sélective du passé , 2003, Rev. d'Intelligence Artif..

[61]  Robert J. Schalkoff,et al.  Artificial Intelligence: An Engineering Approach , 1990 .

[62]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[63]  Herbert A. Simon,et al.  Situated Action: A Symbolic Interpretation , 1993, Cogn. Sci..

[64]  E. T. Copson Asymptotic Expansions: The method of steepest descents , 1965 .

[65]  Long Ji Lin,et al.  Scaling Up Reinforcement Learning for Robot Control , 1993, International Conference on Machine Learning.

[66]  Stevan Harnad,et al.  Symbol grounding problem , 1990, Scholarpedia.

[67]  Reid G. Smith,et al.  The Contract Net Protocol: High-Level Communication and Control in a Distributed Problem Solver , 1980, IEEE Transactions on Computers.

[68]  Olivier Buffet,et al.  Adaptive Combination of Behaviors in an Agent , 2002, ECAI.

[69]  Adam Wolisz,et al.  Performance aspects of trading in open distributed systems , 1993, Comput. Commun..

[70]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[71]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[72]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[73]  Barbara Webb,et al.  Swarm Intelligence: From Natural to Artificial Systems , 2002, Connect. Sci..

[74]  Chris Drummond,et al.  Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks , 2011, J. Artif. Intell. Res..

[75]  Kagan Tumer,et al.  General principles of learning-based multi-agent systems , 1999, AGENTS '99.

[76]  Olivier Buffet,et al.  Multi-Agent Systems by Incremental Gradient Reinforcement Learning , 2001, IJCAI.

[77]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[78]  Balaraman Ravindran,et al.  Hierarchical Optimal Control of MDPs , 1998 .

[79]  Olivier Buffet,et al.  Looking for Scalable Agents , 2000 .

[80]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[81]  Richard Ernest Bellman,et al.  An Introduction to Artificial Intelligence: Can Computers Think? , 1978 .

[82]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[83]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[84]  M. Littman The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .

[85]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[86]  Satinder P. Singh,et al.  How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[87]  Makram Bouzid Contribution à la modélisation de l'interaction agent / environnement : modélisation stochastique et simulation parallèle , 2001 .

[88]  Toby Tyrrell,et al.  Computational mechanisms for action selection , 1993 .

[89]  Alain Dutech Apprentissage d'environnement : approches cognitives et comportementales , 1999 .

[90]  K. R. Dixon,et al.  Incorporating Prior Knowledge and Previously Learned Information into Reinforcement Learning Agents , 2000 .

[91]  Michael K. Sahota Action selection for robots in dynamic environments through inter-behaviour bidding , 1994 .

[92]  Lotfi A. Zadeh,et al.  Outline of a New Approach to the Analysis of Complex Systems and Decision Processes , 1973, IEEE Trans. Syst. Man Cybern..

[93]  Sridhar Mahadevan,et al.  Rapid Task Learning for Real Robots , 1993 .

[94]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[95]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[96]  Olivier Buffet,et al.  Automatic generation of an agent's basic behaviors , 2003, AAMAS '03.

[97]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[98]  George Luger,et al.  Artificial Intelligence: Structures and Strategies for Complex Problem Solving (5th Edition) , 2004 .

[99]  J. Piaget,et al.  La psychologie de l'intelligence , 1949 .

[100]  Vincent Thomas,et al.  MAS and RATS : Multi-agent simulation of social differentiation in rats' groups. , 2002 .

[101]  James L. McClelland,et al.  Autonomous Mental Development by Robots and Animals , 2001, Science.

[102]  Rodney A. Brooks,et al.  Intelligence Without Reason , 1991, IJCAI.

[103]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[104]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[105]  M. Benda,et al.  On Optimal Cooperation of Knowledge Sources , 1985 .

[106]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[107]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[108]  Raymond Kurzweil,et al.  Age of intelligent machines , 1990 .

[109]  Olivier Buffet,et al.  Apprentissage par renforcement dans un système multi-agents , 2000 .

[110]  Michael P. Wellman,et al.  Online learning about other agents in a dynamic multiagent system , 1998, AGENTS '98.

[111]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[112]  Stewart W. Wilson The animat path to AI , 1991 .

[113]  Juyang Weng,et al.  A theory for mentally developing robots , 2002, Proceedings 2nd International Conference on Development and Learning. ICDL 2002.

[114]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[115]  Jörg P. Müller,et al.  Control Architectures for Autonomous and Interacting Agents: A Survey , 1996, PRICAI Workshop on Intelligent Agent Systems.

[116]  Jette Randløv,et al.  Shaping in Reinforcement Learning by Changing the Physics of the Problem , 2000, ICML.

[117]  Julio Rosenblatt,et al.  DAMN: a distributed architecture for mobile navigation , 1997, J. Exp. Theor. Artif. Intell..

[118]  Innes A. Ferguson TouringMachines: an architecture for dynamic, rational, mobile agents , 1992 .

[119]  Michael Wooldridge,et al.  Agent Theories, Architectures, and Languages: A Bibliography , 1995, ATAL.

[120]  Marco Colombetti,et al.  Robot shaping: The Hamster Experiment , 1996 .

[121]  Leslie Pack Kaelbling,et al.  Learning Policies with External Memory , 1999, ICML.

[122]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[123]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[124]  Vincent Chevrier,et al.  A new swarm mechanism based on social spiders colonies: From web weaving to region detection , 2003, Web Intell. Agent Syst..