暂无分享,去创建一个
[1] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[2] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[3] Ann Nowé,et al. Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..
[4] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[5] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[6] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[7] Alessandro Lazaric,et al. Transfer from Multiple MDPs , 2011, NIPS.
[8] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[9] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[10] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[11] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[12] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[13] Marco Pavone,et al. Chance-constrained dynamic programming with application to risk-aware robotic space exploration , 2015, Auton. Robots.
[14] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[15] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[16] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[17] Yuval Tassa,et al. Safe Exploration in Continuous Action Spaces , 2018, ArXiv.
[18] Philip S. Thomas,et al. Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation , 2017, NIPS.
[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[20] John Langford,et al. A Reductions Approach to Fair Classification , 2018, ICML.
[21] Peter L. Bartlett,et al. Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.
[22] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[23] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[24] Yisong Yue,et al. Smooth Imitation Learning for Online Sequence Prediction , 2016, ICML.
[25] Reid G. Simmons,et al. The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms , 2005, Machine Learning.
[26] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[27] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[28] Yann LeCun,et al. Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic , 2019, ICLR.
[29] Thorsten Joachims,et al. Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..
[30] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[31] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[32] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..
[33] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[34] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[35] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[36] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[37] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.
[38] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.
[39] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[40] P. Bougerol,et al. Strict Stationarity of Generalized Autoregressive Processes , 1992 .
[41] Swarat Chaudhuri,et al. Control Regularization for Reduced Variance Reinforcement Learning , 2019, ICML.
[42] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[43] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[44] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[45] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[46] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[47] Masahiro Ono,et al. Chance-Constrained Optimal Path Planning With Obstacles , 2011, IEEE Transactions on Robotics.
[48] Satinder Singh,et al. Self-Imitation Learning , 2018, ICML.
[49] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[50] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[51] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.
[52] David Haussler,et al. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.
[53] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .
[54] Alessandro Lazaric,et al. Finite-sample Analysis of Bellman Residual Minimization , 2010, ACML.
[55] Byron Boots,et al. Accelerating Imitation Learning with Predictive Models , 2018, AISTATS.
[56] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[57] Abbas Mehrabian,et al. Nearly-tight VC-dimension bounds for piecewise linear neural networks , 2017, COLT.
[58] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[59] J. Andrew Bagnell,et al. Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.
[60] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[61] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[62] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[63] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[64] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[65] Hervé Frezza-Buet,et al. Sample-efficient batch reinforcement learning for dialogue management optimization , 2011, TSLP.
[66] Shimon Whiteson,et al. A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..