Policy Search using Paired Comparisons
暂无分享,去创建一个
[1] John A. Nelder,et al. A Simplex Method for Function Minimization , 1965, Comput. J..
[2] J. S. Hunter,et al. Statistics for experimenters : an introduction to design, data analysis, and model building , 1979 .
[3] Andrew W. Moore,et al. Efficient Algorithms for Minimizing Cross Validation Error , 1994, ICML.
[4] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[5] Mance E. Harmon,et al. Multi-Agent Residual Advantage Learning with General Function Approximation. , 1996 .
[6] Margaret H. Wright,et al. Direct search methods: Once scorned, now respectable , 1996 .
[7] Astro Teller,et al. Automatically Choosing the Number of Fitness Cases: The Rational Allocation of Trials , 1997 .
[8] Rainer Storn,et al. Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..
[9] R. Storn,et al. Differential Evolution - A simple and efficient adaptive scheme for global optimization over continuous spaces , 2004 .
[10] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[11] M. J. D. Powell,et al. Direct search algorithms for optimization calculations , 1998, Acta Numerica.
[12] Jieyu Zhao,et al. Direct Policy Search and Uncertain Policy Evaluation , 1998 .
[13] Andrew W. Moore,et al. Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.
[14] L. Baird. Reinforcement Learning Through Gradient Descent , 1999 .
[15] C. T. Kelley,et al. Detection and Remediation of Stagnation in the Nelder--Mead Algorithm Using a Sufficient Decrease Condition , 1999, SIAM J. Optim..
[16] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[17] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[18] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[19] J. Baxter,et al. Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).
[20] Andrew W. Moore,et al. Learning Evaluation Functions to Improve Optimization by Local Search , 2001, J. Mach. Learn. Res..
[21] Andrew W. Moore,et al. Direct Policy Search using Paired Statistical Tests , 2001, ICML.
[22] Isaac E. Lagaris,et al. Training Reinforcement Neurocontrollers Using the Polytope Algorithm , 1998, Neural Processing Letters.