Direct Policy Search and Uncertain Policy Evaluation
暂无分享,去创建一个
[1] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[2] Dave Cliff,et al. Adding Temporary Memory to ZCS , 1994, Adapt. Behav..
[3] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[4] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[5] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[6] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
[7] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..
[8] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[9] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.
[10] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[11] M. V. Rossum,et al. In Neural Computation , 2022 .
[12] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[13] Nichael Lynn Cramer,et al. A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.
[14] Jürgen Schmidhuber,et al. Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.
[15] Leslie Pack Kaelbling,et al. On reinforcement learning for robots , 1996, IROS.
[16] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.
[17] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[18] Pat Langley,et al. Learning Cooperative Lane Selection Strategies for Highways , 1998, AAAI/IAAI.
[19] Toshio Odanaka,et al. ADAPTIVE CONTROL PROCESSES , 1990 .
[20] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[21] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[22] Ari Juels,et al. Stochastic Hillclimbing as a Baseline Method for , 1994 .
[23] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[24] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[25] R. Bellman,et al. V. Adaptive Control Processes , 1964 .
[26] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.
[27] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[28] Martin Wattenberg,et al. Stochastic Hillclimbing as a Baseline Mathod for Evaluating Genetic Algorithms , 1995, NIPS.