暂无分享,去创建一个
Alexander J. Smola | Rasool Fakoor | Jonas Mueller | Pratik Chaudhari | Alex Smola | P. Chaudhari | Rasool Fakoor | Jonas Mueller | Jonas W. Mueller
[1] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[2] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[3] Alexander J. Smola,et al. DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning , 2020, ArXiv.
[4] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[5] P. Alam,et al. H , 1887, High Explosives, Propellants, Pyrotechnics.
[6] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[7] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[8] Alexander J. Smola,et al. P3O: Policy-on Policy-off Policy Optimization , 2019, UAI.
[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[10] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[11] Tom Schaul,et al. Learning from Demonstrations for Real World Reinforcement Learning , 2017, ArXiv.
[12] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[13] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[14] Thorsten Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[15] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[16] Martin J. Wainwright,et al. Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.
[17] Qing Wang,et al. Exponentially Weighted Imitation Learning for Batched Historical Data , 2018, NeurIPS.
[18] Imre Csiszár,et al. Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .
[19] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[20] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[21] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[22] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[23] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[24] Zhuoran Yang,et al. Is Pessimism Provably Efficient for Offline RL? , 2020, ICML.
[25] A. Müller. Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.
[26] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .
[27] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[28] Sergey Levine,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[29] Pieter Abbeel,et al. An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.
[30] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[31] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[32] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[33] Imre Csiszár,et al. Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.
[34] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .
[35] Vladimir Vapnik,et al. An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.
[36] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[37] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[38] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[39] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[40] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.
[41] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[42] S. R. Jammalamadaka,et al. Empirical Processes in M-Estimation , 2001 .
[43] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[44] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[45] Seyed Kamyar Seyed Ghasemipour,et al. EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL , 2020, ICML.
[46] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[47] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[48] Martha White,et al. Maxmin Q-learning: Controlling the Estimation Bias of Q-learning , 2020, ICLR.
[49] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[50] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[51] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[52] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[53] Yao Liu,et al. Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions , 2020, ICML.
[54] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[55] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Deep Reinforcement Learning , 2020, International Conference on Machine Learning.
[56] Richard Y. Chen,et al. UCB EXPLORATION VIA Q-ENSEMBLES , 2018 .
[57] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[58] Csaba Szepesvári,et al. Efficient approximate planning in continuous space Markovian Decision Problems , 2001, AI Commun..
[59] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.
[60] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[61] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[62] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[63] Emma Brunskill,et al. Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration , 2020, NeurIPS.
[64] Alexander J. Smola,et al. Unifying Divergence Minimization and Statistical Inference Via Convex Duality , 2006, COLT.
[65] Dmitry Vetrov,et al. Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics , 2020, ICML.
[66] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[67] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[68] Sergey Levine,et al. Benchmarks for Deep Off-Policy Evaluation , 2021, ICLR.
[69] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[70] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[71] Olivier Sigaud,et al. The problem with DDPG: understanding failures in deterministic environments with sparse rewards , 2019, ICANN.
[72] P. Alam. ‘G’ , 2021, Composites Engineering: An A–Z Guide.
[73] Nando de Freitas,et al. Hyperparameter Selection for Offline Reinforcement Learning , 2020, ArXiv.
[74] Yuxi Li,et al. Deep Reinforcement Learning: An Overview , 2017, ArXiv.
[75] Marc G. Bellemare,et al. The Importance of Pessimism in Fixed-Dataset Policy Optimization , 2020, ArXiv.
[76] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.