Multi-Agent Learning with the Success-Story Algorithm
暂无分享,去创建一个
[1] Mark S. Boddy,et al. Deliberation Scheduling for Problem Solving in Time-Constrained Environments , 1994, Artif. Intell..
[2] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[3] Steven Douglas Whitehead,et al. Reinforcement learning for the adaptive control of perception and action , 1992 .
[4] Russell Greiner,et al. PALO: A Probabilistic Hill-Climbing Algorithm , 1996, Artif. Intell..
[5] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .
[6] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[7] Sandip Sen,et al. Evolution and learning in multiagent systems , 1998, Int. J. Hum. Comput. Stud..
[8] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[9] Juergen Schmidhuber,et al. A General Method For Incremental Self-Improvement And Multi-Agent Learning In Unrestricted Environme , 1999 .
[10] Jfirgen Schmidhuber,et al. A GENERAL METHOD FOR MULTI-AGENT REINFORCEMENT LEARNING IN UNRESTRICTED ENVIRONMENTS , 1996 .
[11] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[12] Pattie Maes,et al. Incremental Self-Improvement for Life-Time Multi-Agent Reinforcement Learning , 1996 .
[13] Stuart J. Russell,et al. Principles of Metareasoning , 1989, Artif. Intell..
[14] J. Davenport. Editor , 1960 .
[15] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[16] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.
[17] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[18] Jürgen Schmidhuber,et al. Solving POMDPs with Levin Search and EIRA , 1996, ICML.
[19] Sandip Sen,et al. Adaption and Learning in Multi-Agent Systems , 1995, Lecture Notes in Computer Science.
[20] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[21] Jieyu Zhao,et al. Simple Principles of Metalearning , 1996 .
[22] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.
[23] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[24] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[25] Juergen Schmidhuber,et al. Incremental self-improvement for life-time multi-agent reinforcement learning , 1996 .