Meta-Q-Learning
暂无分享,去创建一个
[1] Stefano Soatto,et al. A Baseline for Few-Shot Image Classification , 2019, ICLR.
[2] Alexander J. Smola,et al. P3O: Policy-on Policy-off Policy Optimization , 2019, UAI.
[3] Subhransu Maji,et al. Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.
[5] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[6] Larry Rudolph,et al. A Closer Look at Deep Policy Gradients , 2018, ICLR.
[7] Larry Rudolph,et al. Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms? , 2018, ArXiv.
[8] Tamim Asfour,et al. ProMP: Proximal Meta-Policy Search , 2018, ICLR.
[9] C. Robert,et al. Rethinking the Effective Sample Size , 2018, International Statistical Review.
[10] Nikos Komodakis,et al. Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[11] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.
[12] J. Schulman,et al. Reptile: a Scalable Metalearning Algorithm , 2018 .
[13] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[14] Pieter Abbeel,et al. Evolved Policy Gradients , 2018, NeurIPS.
[15] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[16] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.
[17] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[18] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[19] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.
[20] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[21] David Silver,et al. Memory-based control with recurrent neural networks , 2015, ArXiv.
[22] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[23] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[24] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[25] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[26] Alexander J. Smola,et al. Doubly Robust Covariate Shift Correction , 2015, AAAI.
[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[28] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[29] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[30] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[31] Alexander J. Smola,et al. Linear-Time Estimators for Propensity Scores , 2011, AISTATS.
[32] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[33] Neil D. Lawrence,et al. Dataset Shift in Machine Learning , 2009 .
[34] Marie Davidian,et al. Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. , 2008, Statistical science : a review journal of the Institute of Mathematical Statistics.
[35] Joseph Kang,et al. Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2973.
[36] J. Robins,et al. Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.
[37] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.
[38] Hoon Kim,et al. Monte Carlo Statistical Methods , 2000, Technometrics.
[39] Jonathan Baxter,et al. A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..
[40] Jürgen Schmidhuber,et al. Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.
[41] Sebastian Thrun,et al. Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.
[42] Jonathan Baxter,et al. Learning internal representations , 1995, COLT '95.
[43] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.
[44] Yoshua Bengio,et al. On the Optimization of a Synaptic Learning Rule , 2007 .
[45] Tom M. Mitchell,et al. The Need for Biases in Learning Generalizations , 2007 .
[46] Timothy J. Robinson,et al. Sequential Monte Carlo Methods in Practice , 2003 .
[47] Nando de Freitas,et al. Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.
[48] S. Resnick. A Probability Path , 1999 .
[49] Paul E. Utgoff,et al. Shift of bias for inductive concept learning , 1984 .