Large-scale dynamic optimization using teams of reinforcement learning agents
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .
[3] H. Witsenhausen. A Counterexample in Stochastic Optimum Control , 1968 .
[4] H. Witsenhausen. Separation of estimation and control for discrete time systems , 1971 .
[5] R. Radner,et al. Economic theory of teams , 1972 .
[6] M. L. Tsetlin,et al. Automaton theory and modeling of biological systems , 1973 .
[7] M. Yadin,et al. Optimal control of elevators , 1977 .
[8] J. Walrand,et al. On delayed sharing patterns , 1978 .
[9] C.C. White,et al. Dynamic programming and stochastic control , 1978, Proceedings of the IEEE.
[10] Yu-Chi Ho. Team decision theory and information structures , 1980, Proceedings of the IEEE.
[11] B. Chandrasekaran,et al. Natural and Social System Metaphors for Distributed Problem Solving: Introduction to the Issue , 1981, IEEE Transactions on Systems, Man, and Cybernetics.
[12] R. Aumann. Survey of Repeated Games , 1981 .
[13] S. Lakshmivarahan,et al. Learning Algorithms for Two-Person Zero-Sum Stochastic Games with Incomplete Information , 1981, Math. Oper. Res..
[14] K. Narendra,et al. Learning Algorithms for Two-Person Zero-Sum Stochastic Games with Incomplete Information: A Unified Approach , 1982 .
[15] S. Marcus,et al. Decentralized control of finite state Markov processes , 1982 .
[16] Randall Davis,et al. Negotiation as a Metaphor for Distributed Problem Solving , 1988, Artif. Intell..
[17] George R. Strakosch,et al. Vertical Transportation: Elevators and Escalators , 1983 .
[18] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[19] W. Hamilton,et al. The Evolution of Cooperation , 1984 .
[20] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[21] A G Barto,et al. Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.
[22] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .
[23] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[24] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[25] M. Aicardi,et al. Decentralized optimal control of Markov chains with a common past information set , 1987 .
[26] Edmund H. Durfee,et al. Coordination of distributed problem solvers , 1988 .
[27] H. Ujihara,et al. THE REVOLUTIONARY AI-2100 ELEVATOR-GROUP CONTROL SYSTEM AND THE NEW INTELLIGENT OPTION SERIES , 1988 .
[28] Andrew G. Barto,et al. From Chemotaxis to cooperativity: abstract exercises in neuronal learning strategies , 1989 .
[29] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[30] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[31] Francis Crick,et al. The recent excitement about neural networks , 1989, Nature.
[32] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[33] H. Sabourian. Repeated Games: A Survey , 1989 .
[34] Rodney A. Brooks,et al. Learning to Coordinate Behaviors , 1990, AAAI.
[35] F. Hahn. The Economics of missing markets, information, and games , 1990 .
[36] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.
[37] Ramanathan V. Guha,et al. CYC: A Midterm Report , 1990, AI Mag..
[38] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[39] Ming Tan,et al. Learning a Cost-Sensitive Internal Representation for Reinforcement Learning , 1991, ML.
[40] Edmund H. Durfee,et al. THE DISTRIBUTED ARTIFICIAL INTELLIGENCE MELTING POT , 1991 .
[41] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.
[42] Eithan Ephrati,et al. The Clarke Tax as a Consensus Mechanism Among Automated Agents , 1991, AAAI.
[43] Hiromi Inaba,et al. An elevator characterized group supervisory control system , 1991, Proceedings IECON '91: 1991 International Conference on Industrial Electronics, Control and Instrumentation.
[44] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[45] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[46] Rich Caruana,et al. Intelligent Agent Design Issues: Internal Agent State and Incomplete Perception , 1991 .
[47] Grantham K. H. Pang. Elevator scheduling system using blackboard architecture , 1991 .
[48] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[49] Anne H. Soukhanov,et al. The american heritage dictionary of the english language , 1992 .
[50] Seppo J. Ovaska,et al. Electronics and information technology in high-range elevator systems , 1992 .
[51] Andrew W. Moore,et al. Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.
[52] Brahim Chaib-draa,et al. Distributed artificial intelligence: an annotated bibliography , 1992, SGAR.
[53] James Alan Lewis,et al. A dynamic load balancing approach to the control of multi-server polling systems with applications to elevator system dispatching , 1992 .
[54] Moshe Tennenholtz,et al. Emergent Conventions in Multi-Agent Systems: Initial Experimental Results and Observations (Preliminary Report) , 1992, KR.
[55] Toshimitsu Tobita,et al. An online tuning method for multiobjective control of elevator group , 1992, Proceedings of the 1992 International Conference on Industrial Electronics, Control, Instrumentation, and Automation.
[56] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .
[57] Mark B. Ring. Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.
[58] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .
[59] Michael L. Littman,et al. A Distributed Reinforcement Learning Scheme for Network Routing , 1993 .
[60] Suresh K. Khator,et al. Smart lifts: controls design and performance evaluation , 1993 .
[61] Jude W. Shavlik,et al. Learning Symbolic Rules Using Artificial Neural Networks , 1993, ICML.
[62] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[63] Rich Caruana,et al. Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.
[64] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[65] Edwin K. P. Chong,et al. Discrete event systems: Modeling and performance analysis , 1994, Discret. Event Dyn. Syst..
[66] Richard W. Prager,et al. A Modular Q-Learning Architecture for Manipulator Task Decomposition , 1994, ICML.
[67] G. Kane. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .
[68] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[69] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[70] Andrew McCallum,et al. Instance-Based State Identification for Reinforcement Learning , 1994, NIPS.
[71] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[72] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[73] Sandip Sen,et al. Learning to Coordinate without Sharing Information , 1994, AAAI.
[74] Andrew G. Barto,et al. An Actor/Critic Algorithm that is Equivalent to Q-Learning , 1994, NIPS.
[75] Gerhard Weiss,et al. Some Studies in Distributed Machine Learning and Organizational Design , 1994 .
[76] Hajime Kita,et al. Adaptive Optimal Elevator Group Control by Use of Neural Networks , 1994 .
[77] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[78] Nicholas R. Jennings,et al. Controlling Cooperative Problem Solving in Industrial Multi-Agent Systems Using Joint Intentions , 1995, Artif. Intell..
[79] Victor R. Lesser,et al. Learning Coordination Plans in Distributed Problem-Solving Environments , 1995, ICMAS.
[80] Mark Humphrys. W-learning: Competition among selfish Q-learners , 1995 .
[81] Maja J. Mataric,et al. Issues and approaches in the design of collective autonomous agents , 1995, Robotics Auton. Syst..
[82] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[83] Maja J. Mataric,et al. Learning in Multi-Robot Systems , 1995, Adaption and Learning in Multi-Agent Systems.
[84] Gerhard Weiß,et al. Adaptation and Learning in Multi-Agent Systems: Some Remarks and a Bibliography , 1995, Adaption and Learning in Multi-Agent Systems.
[85] Michael Luck,et al. Proceedings of the First International Conference on Multi-Agent Systems , 1995 .
[86] Mandayam A. L. Thathachar,et al. Local and Global Optimization Algorithms for Generalized Learning Automata , 1995, Neural Computation.
[87] Sandip Sen,et al. Adaption and Learning in Multi-Agent Systems , 1995, Lecture Notes in Computer Science.
[88] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[89] Victor Lesser,et al. Learning Experiments in a Heterogeneous Multi-agent System , 1995 .
[90] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[91] Christos G. Cassandras,et al. Application of Q-learning to elevator dispatcidng , 1996 .
[92] Robert H. Crites,et al. Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.
[93] C. Cassandras,et al. Optimal dispatching control for elevator systems during uppeak traffic , 1996, Proceedings of 35th IEEE Conference on Decision and Control.
[94] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[95] Victor Lesser,et al. Learning Situation-specific Coordination in Generalized Partial Global Planning , 1996 .
[96] K. Khalil. On the Complexity of Decentralized Decision Making and Detection Problems , 2022 .