Une double approche modulaire de l'apprentissage par renforcement pour des agents intelligents adaptatifs. (A Twofold Modular Approach of Reinforcement Learning for Adaptive Intelligent Agents)
暂无分享,去创建一个
[1] Olivier Buffet,et al. Learning to weigh basic behaviors in scalable agents , 2002, AAMAS '02.
[2] Sridhar Mahadevan. An Average-Reward Reinforcement Learning Algorithm for Computing Bias-Optimal Policies , 1996, AAAI/IAAI, Vol. 1.
[3] Barbara Hayes-Roth,et al. A Blackboard Architecture for Control , 1985, Artif. Intell..
[4] Leon A. Petrosyan,et al. Game Theory (Second Edition) , 1996 .
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] Alan H. Bond,et al. Readings in Distributed Artificial Intelligence , 1988 .
[7] Nicholas R. Jennings,et al. A Roadmap of Agent Research and Development , 2004, Autonomous Agents and Multi-Agent Systems.
[8] Richard W. Prager,et al. A Modular Q-Learning Architecture for Manipulator Task Decomposition , 1994, ICML.
[9] Master Gardener,et al. Mathematical games: the fantastic combinations of john conway's new solitaire game "life , 1970 .
[10] John Haugeland,et al. Artificial intelligence - the very idea , 1987 .
[11] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .
[12] Olivier Buffet,et al. Incremental reinforcement learning for designing multi-agent systems , 2001, AGENTS '01.
[13] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[14] Rolf Pfeifer,et al. On the role of morphology and materials in adaptive behavior , 2000 .
[15] Kagan Tumer,et al. Collective Intelligence and Braess' Paradox , 2000, AAAI/IAAI.
[16] Olivier Buffet,et al. Apprentissage par renforcement pour la conception de systèmes multi-agents réactifs , 2003, Tech. Sci. Informatiques.
[17] Anne Condon,et al. On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.
[19] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.
[20] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.
[21] Minoru Asada,et al. Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.
[22] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[23] Drew McDermott,et al. Introduction to artificial intelligence , 1986, Addison-Wesley series in computer science.
[24] R. Rescorla,et al. A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .
[25] J J Hopfield,et al. Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.
[26] P.-P. Grasse. La reconstruction du nid et les coordinations interindividuelles chezBellicositermes natalensis etCubitermes sp. la théorie de la stigmergie: Essai d'interprétation du comportement des termites constructeurs , 1959, Insectes Sociaux.
[27] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[28] Bruno Scherrer. Apprentissage de représentation et auto-organisation modulaire pour un agent autonome , 2003 .
[29] Mauro Birattari,et al. Toward the Formal Foundation of Ant Programming , 2002, Ant Algorithms.
[30] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[31] Sandra L. Berger. Massachusetts , 1896, The Journal of comparative medicine and veterinary archives.
[32] Gerhard Weiß,et al. Adaptation and Learning in Multi-Agent Systems: Some Remarks and a Bibliography , 1995, Adaption and Learning in Multi-Agent Systems.
[33] Mark Humphreys,et al. Action selection methods using reinforcement learning , 1997 .
[34] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[35] Bruce Blumberg,et al. Action-selection in hamsterdam: lessons from ethology , 1994 .
[36] Sandra Clara Gadanho,et al. Asynchronous learning by emotions and cognition , 2002 .
[37] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[38] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[39] Guillaume J. Laurent,et al. Synthèse de comportements par apprentissages par renforcement parallèles : application à la commande d'un micromanipulateur plan , 2002 .
[40] L. Shapley. SOME TOPICS IN TWO-PERSON GAMES , 1963 .
[41] Arnaud Dury. Modélisation des interactions dans les systèmes multi-agents , 2000 .
[42] Bruno Bettelheim. Psychanalyse des contes de fées , 1976 .
[43] L. Baird. Reinforcement Learning Through Gradient Descent , 1999 .
[44] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[45] Kee-Eung Kim,et al. Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.
[46] Craig Boutilier,et al. Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.
[47] John K. Slaney,et al. Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards , 2002, UAI.
[48] Jörg P. Müller,et al. The agent architecture InteRRaP : concept and application , 1993 .
[49] G. Di Caro,et al. Ant colony optimization: a new meta-heuristic , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).
[50] B. Skinner,et al. Science and human behavior , 1953 .
[51] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[52] Satinder Singh. Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.
[53] J. Neumann,et al. Theory of games and economic behavior , 1945, 100 Years of Math Milestones.
[54] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .
[55] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[56] Manuela M. Veloso,et al. Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.
[57] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.
[58] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[59] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[60] Alain Dutech,et al. Apprentissage par renforcement pour les processus décisionnels de Markov partiellement observés Apprendre une extension sélective du passé , 2003, Rev. d'Intelligence Artif..
[61] Robert J. Schalkoff,et al. Artificial Intelligence: An Engineering Approach , 1990 .
[62] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[63] Herbert A. Simon,et al. Situated Action: A Symbolic Interpretation , 1993, Cogn. Sci..
[64] E. T. Copson. Asymptotic Expansions: The method of steepest descents , 1965 .
[65] Long Ji Lin,et al. Scaling Up Reinforcement Learning for Robot Control , 1993, International Conference on Machine Learning.
[66] Stevan Harnad,et al. Symbol grounding problem , 1990, Scholarpedia.
[67] Reid G. Smith,et al. The Contract Net Protocol: High-Level Communication and Control in a Distributed Problem Solver , 1980, IEEE Transactions on Computers.
[68] Olivier Buffet,et al. Adaptive Combination of Behaviors in an Agent , 2002, ECAI.
[69] Adam Wolisz,et al. Performance aspects of trading in open distributed systems , 1993, Comput. Commun..
[70] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[71] Craig Boutilier,et al. Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.
[72] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[73] Barbara Webb,et al. Swarm Intelligence: From Natural to Artificial Systems , 2002, Connect. Sci..
[74] Chris Drummond,et al. Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks , 2011, J. Artif. Intell. Res..
[75] Kagan Tumer,et al. General principles of learning-based multi-agent systems , 1999, AGENTS '99.
[76] Olivier Buffet,et al. Multi-Agent Systems by Incremental Gradient Reinforcement Learning , 2001, IJCAI.
[77] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[78] Balaraman Ravindran,et al. Hierarchical Optimal Control of MDPs , 1998 .
[79] Olivier Buffet,et al. Looking for Scalable Agents , 2000 .
[80] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[81] Richard Ernest Bellman,et al. An Introduction to Artificial Intelligence: Can Computers Think? , 1978 .
[82] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[83] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[84] M. Littman. The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .
[85] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.
[86] Satinder P. Singh,et al. How to Dynamically Merge Markov Decision Processes , 1997, NIPS.
[87] Makram Bouzid. Contribution à la modélisation de l'interaction agent / environnement : modélisation stochastique et simulation parallèle , 2001 .
[88] Toby Tyrrell,et al. Computational mechanisms for action selection , 1993 .
[89] Alain Dutech. Apprentissage d'environnement : approches cognitives et comportementales , 1999 .
[90] K. R. Dixon,et al. Incorporating Prior Knowledge and Previously Learned Information into Reinforcement Learning Agents , 2000 .
[91] Michael K. Sahota. Action selection for robots in dynamic environments through inter-behaviour bidding , 1994 .
[92] Lotfi A. Zadeh,et al. Outline of a New Approach to the Analysis of Complex Systems and Decision Processes , 1973, IEEE Trans. Syst. Man Cybern..
[93] Sridhar Mahadevan,et al. Rapid Task Learning for Real Robots , 1993 .
[94] Nils J. Nilsson,et al. Artificial Intelligence , 1974, IFIP Congress.
[95] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[96] Olivier Buffet,et al. Automatic generation of an agent's basic behaviors , 2003, AAMAS '03.
[97] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[98] George Luger,et al. Artificial Intelligence: Structures and Strategies for Complex Problem Solving (5th Edition) , 2004 .
[99] J. Piaget,et al. La psychologie de l'intelligence , 1949 .
[100] Vincent Thomas,et al. MAS and RATS : Multi-agent simulation of social differentiation in rats' groups. , 2002 .
[101] James L. McClelland,et al. Autonomous Mental Development by Robots and Animals , 2001, Science.
[102] Rodney A. Brooks,et al. Intelligence Without Reason , 1991, IJCAI.
[103] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.
[104] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..
[105] M. Benda,et al. On Optimal Cooperation of Knowledge Sources , 1985 .
[106] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[107] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[108] Raymond Kurzweil,et al. Age of intelligent machines , 1990 .
[109] Olivier Buffet,et al. Apprentissage par renforcement dans un système multi-agents , 2000 .
[110] Michael P. Wellman,et al. Online learning about other agents in a dynamic multiagent system , 1998, AGENTS '98.
[111] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[112] Stewart W. Wilson. The animat path to AI , 1991 .
[113] Juyang Weng,et al. A theory for mentally developing robots , 2002, Proceedings 2nd International Conference on Development and Learning. ICDL 2002.
[114] Maja J. Mataric,et al. Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.
[115] Jörg P. Müller,et al. Control Architectures for Autonomous and Interacting Agents: A Survey , 1996, PRICAI Workshop on Intelligent Agent Systems.
[116] Jette Randløv,et al. Shaping in Reinforcement Learning by Changing the Physics of the Problem , 2000, ICML.
[117] Julio Rosenblatt,et al. DAMN: a distributed architecture for mobile navigation , 1997, J. Exp. Theor. Artif. Intell..
[118] Innes A. Ferguson. TouringMachines: an architecture for dynamic, rational, mobile agents , 1992 .
[119] Michael Wooldridge,et al. Agent Theories, Architectures, and Languages: A Bibliography , 1995, ATAL.
[120] Marco Colombetti,et al. Robot shaping: The Hamster Experiment , 1996 .
[121] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[122] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[123] R. A. Brooks,et al. Intelligence without Representation , 1991, Artif. Intell..
[124] Vincent Chevrier,et al. A new swarm mechanism based on social spiders colonies: From web weaving to region detection , 2003, Web Intell. Agent Syst..