On the Role of Weight Sharing During Deep Option Learning
暂无分享,去创建一个
[1] Andrew G. Barto,et al. Conjugate Markov Decision Processes , 2011, ICML.
[2] Pierre-Yves Oudeyer,et al. How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments , 2018, ArXiv.
[3] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[4] Philip S. Thomas,et al. Reinforcement Learning Without Backpropagation or a Clock , 2019 .
[5] Doina Precup,et al. Learnings Options End-to-End for Continuous Action Tasks , 2017, ArXiv.
[6] M. Riemer,et al. Representation Stability as a Regularizer for Improved Text Analytics Transfer Learning , 2017, arXiv.org.
[7] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[8] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[10] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.
[11] Quoc V. Le,et al. Diversity and Depth in Per-Example Routing Models , 2018, ICLR.
[12] Nahum Shimkin,et al. Unified Inter and Intra Options Learning Using Policy Gradient Methods , 2011, EWRL.
[13] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[14] Ignacio Cases,et al. Routing Networks and the Challenges of Modular and Compositional Computation , 2019, ArXiv.
[15] Rich Caruana,et al. Multitask Learning , 1997, Machine-mediated learning.
[16] Philip S. Thomas,et al. Policy Gradient Coagent Networks , 2011, NIPS.
[17] Sophia Krasikov,et al. A Deep Learning and Knowledge Transfer Based Architecture for Social Media User Characteristic Determination , 2015, SocialNLP@NAACL.
[18] Razvan Pascanu,et al. Progressive Neural Networks , 2016, ArXiv.
[19] Gerald Tesauro,et al. Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.
[20] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.
[21] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[22] Gerald Tesauro,et al. Learning Abstract Options , 2018, NeurIPS.
[23] Djallel Bouneffouf,et al. Scalable Recollections for Continual Lifelong Learning , 2017, AAAI.
[24] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[25] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[26] Doina Precup,et al. When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.
[27] M. Franceschini,et al. Generative Knowledge Distillation for General Purpose Function Compression , 2017 .
[28] Philip S. Thomas,et al. Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock , 2019, ICML.
[29] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[30] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[31] Anthony V. Robins,et al. Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..
[32] Jiwon Kim,et al. Continual Learning with Deep Generative Replay , 2017, NIPS.
[33] Pieter Abbeel,et al. On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient , 2010, NIPS.
[34] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[35] Martial Hebert,et al. Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Christopher Potts,et al. Recursive Routing Networks: Learning to Compose Modules for Language Understanding , 2019, NAACL.
[37] Joachim Bingel,et al. Sluice networks: Learning what to share between loosely related tasks , 2017, ArXiv.
[38] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[39] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[40] Demis Hassabis,et al. Neural Episodic Control , 2017, ICML.
[41] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[42] Joelle Pineau,et al. Conditional Computation in Neural Networks for faster models , 2015, ArXiv.
[43] Matthew Riemer,et al. Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning , 2017, ICLR.