暂无分享,去创建一个
[1] Michael L. Littman,et al. Measuring and Characterizing Generalization in Deep Reinforcement Learning , 2018, Applied AI Letters.
[2] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[3] Iryna Gurevych,et al. Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging , 2017, EMNLP.
[4] Philip J. Fleming,et al. How not to lie with statistics: the correct way to summarize benchmark results , 1986, CACM.
[5] Joelle Pineau,et al. RE-EVALUATE: Reproducibility in Evaluating Reinforcement Learning Algorithms , 2018 .
[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[7] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[8] Paul Van Dooren,et al. Maximizing PageRank via outlinks , 2007, ArXiv.
[9] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[10] Peter Henderson,et al. Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.
[11] Majid Nili Ahmadabadi,et al. Interaction of Culture-based Learning and Cooperative Co-evolution and its Application to Automatic Behavior-based System Design , 2010, IEEE Transactions on Evolutionary Computation.
[12] Shimon Whiteson,et al. Report on the 2008 Reinforcement Learning Competition , 2010, AI Mag..
[13] Alborz Geramifard,et al. RLPy: a value-function-based reinforcement learning framework for education and research , 2015, J. Mach. Learn. Res..
[14] Joelle Pineau,et al. Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program) , 2020, J. Mach. Learn. Res..
[15] J. Kiefer,et al. Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator , 1956 .
[16] John Foley,et al. ToyBox: Better Atari Environments for Testing Reinforcement Learning Agents , 2018, ArXiv.
[17] T. W. Anderson. CONFIDENCE LIMITS FOR THE EXPECTED VALUE OF AN ARBITRARY BOUNDED RANDOM VARIABLE WITH A CONTINUOUS DISTRIBUTION FUNCTION , 1969 .
[18] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[19] Balázs Csanád Csáji,et al. PageRank optimization by edge selection , 2009, Discret. Appl. Math..
[20] Christos Dimitrakakis,et al. The Reinforcement Learning Competition 2014 , 2014, AI Mag..
[21] Catherine C. McGeoch. A Guide to Experimental Algorithmics , 2012 .
[22] Razvan V. Florian,et al. Correct equations for the dynamics of the cart-pole system , 2005 .
[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[24] Zachary C. Lipton,et al. Troubling Trends in Machine Learning Scholarship , 2018, ACM Queue.
[25] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[26] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[27] André da Motta Salles Barreto,et al. Probabilistic performance profiles for the experimental evaluation of stochastic algorithms , 2010, GECCO '10.
[28] Jon D. McAuliffe,et al. Uniform, nonparametric, non-asymptotic confidence sequences , 2018 .
[29] Pierre-Yves Oudeyer,et al. How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments , 2018, ArXiv.
[30] Alan Edelman,et al. Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..
[31] Will Dabney,et al. ADAPTIVE STEP-SIZES FOR REINFORCEMENT LEARNING , 2014 .
[32] W. Bruce Croft,et al. Distributed Evaluations: Ending Neural Point Metrics , 2018, ArXiv.
[33] Stéphane Gaubert,et al. Ergodic Control and Polyhedral Approaches to PageRank Optimization , 2010, IEEE Transactions on Automatic Control.
[34] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.
[35] Christos Dimitrakakis,et al. The reinforcement learning competition , 2014 .
[36] Rudolf Fleischer,et al. Experimental Algorithmics, From Algorithm Design to Robust and Efficient Software [Dagstuhl seminar, September 2000] , 2002 .
[37] Shimon Whiteson,et al. The Reinforcement Learning Competitions , 2010 .
[38] P. Massart. The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .
[39] David R. Cox,et al. The Oxford Dictionary of Statistical Terms , 2006 .
[40] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[41] Mario Lucic,et al. Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.
[42] Shimon Whiteson,et al. Protecting against evaluation overfitting in empirical reinforcement learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[43] Sergey Levine,et al. The Mirage of Action-Dependent Baselines in Reinforcement Learning , 2018, ICML.
[44] John N. Hooker,et al. Testing heuristics: We have it all wrong , 1995, J. Heuristics.
[45] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.
[46] Christos H. Papadimitriou,et al. α-Rank: Multi-Agent Evaluation by Evolution , 2019, Scientific Reports.
[47] Shimon Whiteson,et al. Introduction to the special issue on empirical evaluations in reinforcement learning , 2011, Machine Learning.
[48] Jorge J. Moré,et al. Digital Object Identifier (DOI) 10.1007/s101070100263 , 2001 .
[49] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.
[50] Uta Boehm,et al. Experimental Algorithmics From Algorithm Design To Robust And Efficient Software , 2016 .
[51] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[52] Patrick M. Pilarski,et al. Model-Free reinforcement learning with continuous action in practice , 2012, 2012 American Control Conference (ACC).
[53] Marc G. Bellemare,et al. A Comparative Analysis of Expected and Distributional Reinforcement Learning , 2019, AAAI.
[54] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[55] Scott M. Jordan. Using Cumulative Distribution Based Performance Analysis to Benchmark Models , 2018 .
[56] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[57] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.
[58] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[59] Marco Wiering,et al. Convergence and Divergence in Standard and Averaging Reinforcement Learning , 2004, ECML.
[60] Kaleigh Clary,et al. Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning , 2020, ICLR.
[61] Michal Valko,et al. Multiagent Evaluation under Incomplete Information , 2019, NeurIPS.