Lipschitz Continuity in Model-based Reinforcement Learning
暂无分享,去创建一个
Kavosh Asadi | Dipendra Misra | Michael L. Littman | Dipendra Kumar Misra | M. Littman | Kavosh Asadi
[1] Alfred Müller,et al. Optimal selection from distributions with unknown parameters: Robustness of Bayesian models , 1996, Math. Methods Oper. Res..
[2] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[3] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[4] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[5] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.
[6] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[7] C. Villani. Optimal Transport: Old and New , 2008 .
[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[9] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.
[10] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[11] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[12] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[13] Hariharan Narayanan,et al. Sample Complexity of Testing the Manifold Hypothesis , 2010, NIPS.
[14] Minwoo Lee,et al. Faster reinforcement learning after pretraining deep networks to predict state dynamics , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).
[15] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .
[16] D. Bertsekas. Convergence of discretization procedures in dynamic programming , 1975 .
[17] Michail G. Lagoudakis,et al. On the locality of action domination in sequential decision making , 2010, ISAIM.
[18] Catholijn M. Jonker,et al. Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning , 2017, ArXiv.
[19] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[20] Kavosh Asadi,et al. Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning , 2018, ArXiv.
[21] Daniel Nikovski,et al. Value-Aware Loss Function for Model-based Reinforcement Learning , 2017, AISTATS.
[22] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[23] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[24] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[25] R. Bellman. A Markovian Decision Process , 1957 .
[26] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[27] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[28] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[29] Luca Bascetta,et al. Policy gradient in Lipschitz Markov Decision Processes , 2015, Machine Learning.
[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[31] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[32] Erik Talvitie,et al. Self-Correcting Models for Model-Based Reinforcement Learning , 2016, AAAI.
[33] D. Freedman,et al. Some Asymptotic Theory for the Bootstrap , 1981 .
[34] Martial Hebert,et al. Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.
[35] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[36] Erik Talvitie,et al. Model Regularization for Stable Sample Rollouts , 2014, UAI.
[37] Maneesh Kumar Singh,et al. Lipschitz Properties for Deep Convolutional Networks , 2017, ArXiv.
[38] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[39] Jason Pazis,et al. PAC Optimal Exploration in Continuous Space Markov Decision Processes , 2013, AAAI.
[40] Jürgen Schmidhuber,et al. Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..
[41] Lacra Pavel,et al. On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning , 2017, ArXiv.
[42] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[43] Andrew W. Moore,et al. Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.
[44] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[45] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[46] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[47] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.
[48] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[49] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.
[50] Karl Hinderer,et al. Lipschitz Continuity of Value Functions in Markovian Decision Processes , 2005, Math. Methods Oper. Res..
[51] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[52] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[53] Kavosh Asadi,et al. Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning , 2018, ArXiv.
[54] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[55] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[56] Bernardo Ávila Pires,et al. Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models , 2016, COLT.
[57] Csaba Szepesvári,et al. –armed Bandits , 2022 .
[58] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.