暂无分享,去创建一个
[1] S. Resnick. A Probability Path , 1999 .
[2] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.
[3] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.
[4] Stefano Soatto,et al. A Baseline for Few-Shot Image Classification , 2019, ICLR.
[5] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[6] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[7] Tom M. Mitchell,et al. The Need for Biases in Learning Generalizations , 2007 .
[8] Subhransu Maji,et al. Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[10] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.
[11] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[12] J. Robins,et al. Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.
[13] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[14] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[15] Joseph Kang,et al. Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.
[16] C. Robert,et al. Rethinking the Effective Sample Size , 2018, International Statistical Review.
[17] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[18] Jonathan Baxter,et al. Learning internal representations , 1995, COLT '95.
[19] Neil D. Lawrence,et al. Dataset Shift in Machine Learning , 2009 .
[20] Alexander J. Smola,et al. Doubly Robust Covariate Shift Correction , 2015, AAAI.
[21] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.
[22] Hoon Kim,et al. Monte Carlo Statistical Methods , 2000, Technometrics.
[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[24] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[25] Tamim Asfour,et al. ProMP: Proximal Meta-Policy Search , 2018, ICLR.
[26] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[27] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[28] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.
[29] Alexander J. Smola,et al. P3O: Policy-on Policy-off Policy Optimization , 2019, UAI.
[30] Larry Rudolph,et al. Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms? , 2018, ArXiv.
[31] Nando de Freitas,et al. Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.
[32] Timothy J. Robinson,et al. Sequential Monte Carlo Methods in Practice , 2003 .
[33] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.
[34] Nikos Komodakis,et al. Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[35] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[36] Jonathan Baxter,et al. A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..
[37] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[38] Paul E. Utgoff,et al. Shift of bias for inductive concept learning , 1984 .
[39] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[40] Pieter Abbeel,et al. Evolved Policy Gradients , 2018, NeurIPS.
[41] David Silver,et al. Memory-based control with recurrent neural networks , 2015, ArXiv.
[42] Jürgen Schmidhuber,et al. Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.
[43] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[44] Alexander J. Smola,et al. Linear-Time Estimators for Propensity Scores , 2011, AISTATS.
[45] Sebastian Thrun,et al. Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.
[46] Yoshua Bengio,et al. On the Optimization of a Synaptic Learning Rule , 2007 .