暂无分享,去创建一个
Peter L. Bartlett | Pieter Abbeel | Xi Chen | Ilya Sutskever | John Schulman | Yan Duan | J. Schulman | P. Abbeel | P. Bartlett | Yan Duan | Ilya Sutskever | Xi Chen | I. Sutskever | John Schulman
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[3] Peter Whittle,et al. Optimization Over Time , 1982 .
[4] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[5] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[6] Doina Precup,et al. Using Options for Knowledge Transfer in Reinforcement Learning , 1999 .
[7] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[8] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.
[9] Tamer Basar,et al. Dual Control Theory , 2001 .
[10] Sepp Hochreiter,et al. Meta-learning with backpropagation , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).
[11] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[12] Junichiro Yoshimoto,et al. Control of exploitation-exploration meta-parameter in reinforcement learning , 2002, Neural Networks.
[13] Kenji Doya,et al. Meta-learning in Reinforcement Learning , 2003, Neural Networks.
[14] Satinder Singh. Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.
[15] Ricardo Vilalta,et al. A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.
[16] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[17] Alan Fern,et al. Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.
[18] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[19] G. Evans,et al. Learning to Optimize , 2008 .
[20] Yoshua Bengio,et al. Zero-data Learning of New Tasks , 2008, AAAI.
[21] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[22] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..
[23] David Hsu,et al. Planning under Uncertainty for Robotic Tasks with Mixed Observability , 2010, Int. J. Robotics Res..
[24] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[25] Joshua B. Tenenbaum,et al. One shot learning of simple visual concepts , 2011, CogSci.
[26] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[27] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[28] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[29] David S. Leslie,et al. Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..
[30] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[31] Aditya Mahajan,et al. Multi‐Armed Bandits, Gittins Index, and its Calculation , 2014 .
[32] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[33] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[34] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[35] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[36] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[37] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..
[38] Thomas B. Schön,et al. From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.
[39] Wojciech Zaremba,et al. An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.
[40] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[41] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[42] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[43] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[44] Gregory R. Koch,et al. Siamese Neural Networks for One-Shot Image Recognition , 2015 .
[45] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[46] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[47] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.
[48] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[49] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[50] Honglak Lee,et al. Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.
[51] Sergey Levine,et al. One-shot learning of manipulation skills with online dynamics adaptation and neural network priors , 2015, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[52] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[53] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .
[54] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[55] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).
[56] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[57] Daan Wierstra,et al. One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.
[58] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.
[59] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[60] DarrellTrevor,et al. End-to-end training of deep visuomotor policies , 2016 .
[61] Razvan Pascanu,et al. Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.
[62] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.
[63] Sergey Levine,et al. Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).