An Introduction to Deep Reinforcement Learning
暂无分享,去创建一个
Peter Henderson | Joelle Pineau | Marc G. Bellemare | Vincent François-Lavet | Riashat Islam | Joelle Pineau | Riashat Islam | Peter Henderson | Vincent François-Lavet
[1] D. Whitteridge. Lectures on Conditioned Reflexes , 1942, Nature.
[2] Claude E. Shannon,et al. Programming a computer for playing chess , 1950 .
[3] R. Bellman. A Markovian Decision Process , 1957 .
[4] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[5] Stuart E. Dreyfus,et al. Applied Dynamic Programming , 1965 .
[6] Walter Dandy,et al. The Brain , 1966 .
[7] R. Rescorla. A theory of pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement , 1972 .
[8] D. Vere-Jones. Markov Chains , 1972, Nature.
[9] S. C. Jaquette. Markov Decision Processes with a New Optimality Criterion: Discrete Time , 1973 .
[10] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..
[11] Kunihiko Fukushima,et al. Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Visual Pattern Recognition , 1982 .
[12] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[13] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[14] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[15] C. Watkins. Learning from delayed rewards , 1989 .
[16] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[17] B. Widrow,et al. Neural networks for self-learning control systems , 1990, IEEE Control Systems Magazine.
[18] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.
[19] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .
[20] Bernd Brügmann Max-Planck. Monte Carlo Go , 1993 .
[21] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.
[22] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[23] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[24] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[25] Deborah Silver,et al. Feature Visualization , 1994, Scientific Visualization.
[26] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[27] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[28] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[29] Geoffrey J. Gordon. Stable Fitted Reinforcement Learning , 1995, NIPS.
[30] Inman Harvey,et al. Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics , 1995, ECAL.
[31] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[32] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[33] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[34] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[35] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[36] Richard S. Sutton,et al. Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .
[37] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
[38] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[39] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[40] Thomas G. Dietterich. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.
[41] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[42] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .
[43] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[44] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[45] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[46] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[47] Jay H. Lee,et al. Model predictive control: past, present and future , 1999 .
[48] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[49] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[50] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[51] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[52] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[53] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[54] Manuela M. Veloso,et al. Layered Learning , 2000, ECML.
[55] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[56] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.
[57] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[58] Murray Campbell,et al. Deep Blue , 2002, Artif. Intell..
[59] Clay B. Holroyd,et al. The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. , 2002, Psychological review.
[60] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[61] D. Braziunas. POMDP solution methods , 2003 .
[62] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
[63] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[64] Remco R. Bouckaert,et al. Choosing Between Two Learning Algorithms Based on Calibrated Tests , 2003, ICML.
[65] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[66] Gareth James,et al. Variance and Bias for General Loss Functions , 2003, Machine Learning.
[67] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[68] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[69] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[70] Eibe Frank,et al. Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms , 2004, PAKDD.
[71] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.
[72] Anja Vogler,et al. An Introduction to Multivariate Statistical Analysis , 2004 .
[73] Colin Camerer,et al. Neuroeconomics: How Neuroscience Can Inform Economics , 2005 .
[74] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[75] A. Barto,et al. An algebraic approach to abstraction in reinforcement learning , 2004 .
[76] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[77] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[78] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[79] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[80] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[81] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[82] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[83] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[84] Kurt Driessens,et al. Relational Reinforcement Learning , 1998, Machine-mediated learning.
[85] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.
[86] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .
[87] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..
[88] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[89] Angela J. Yu,et al. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.
[90] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.
[91] Andy Liaw,et al. Classification and Regression by randomForest , 2007 .
[92] Csaba Szepesvári,et al. Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.
[93] Michael L. Littman,et al. Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.
[94] Louis Wehenkel,et al. Variable selection for dynamic treatment regimes: a reinforcement learning approach , 2008 .
[95] P. Dayan,et al. Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.
[96] P. Dayan,et al. Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.
[97] Marek Petrik,et al. Biasing Approximate Dynamic Programming with a Lower Discount Factor , 2008, NIPS.
[98] Thomas G. Dietterich. Machine Learning and Ecosystem Informatics: Challenges and Opportunities , 2009, ACML.
[99] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[100] Shimon Whiteson,et al. Automatic Feature Selection for Model-Based Reinforcement Learning in Factored MDPs , 2009, 2009 International Conference on Machine Learning and Applications.
[101] Y. Niv. Reinforcement learning in the brain , 2009 .
[102] Pascal Vincent,et al. Visualizing Higher-Layer Features of a Deep Network , 2009 .
[103] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[104] Brian Tanner,et al. RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..
[105] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[106] Wouter Josemans. Generalization in Reinforcement Learning , 2009 .
[107] P. Montague,et al. Theoretical and Empirical Studies of Learning , 2009 .
[108] Monica Dinculescu,et al. Approximate Predictive Representations of Partially Observable Systems , 2010, ICML.
[109] Masashi Sugiyama,et al. Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.
[110] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[111] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.
[112] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[113] A. Casadevall,et al. Reproducible Science , 2010, Infection and Immunity.
[114] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[115] Shimon Whiteson,et al. Protecting against evaluation overfitting in empirical reinforcement learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[116] Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..
[117] Yi Sun,et al. Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.
[118] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.
[119] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[120] D. Kahneman. Thinking, Fast and Slow , 2011 .
[121] Jan Peters,et al. Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.
[122] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[123] Ian D. Watson,et al. Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft:Broodwar , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).
[124] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[125] Regina Barzilay,et al. Learning High-Level Planning from Text , 2012, ACL.
[126] H. Seo,et al. Neural basis of reinforcement learning and decision making. , 2012, Annual review of neuroscience.
[127] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[128] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[129] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[130] Bruno Castro da Silva,et al. Learning Parameterized Skills , 2012, ICML.
[131] Anton Nekrutenko,et al. Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..
[132] Louis Wehenkel,et al. Batch mode reinforcement learning based on the synthesis of artificial trajectories , 2013, Ann. Oper. Res..
[133] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[134] Stefan Schaal,et al. Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.
[135] P. Montague,et al. Reinforcement Learning Models Then-and-Now: From Single Cells to Modern Neuroimaging , 2013 .
[136] Pieter Abbeel,et al. Learning from Demonstrations Through the Use of Non-rigid Registration , 2013, ISRR.
[137] Qiang Yang,et al. Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.
[138] S. Barry Cooper,et al. Digital Computers Applied to Games , 2013 .
[139] Kris K. Hauser,et al. Artificial intelligence framework for simulating clinical decision-making: A Markov decision process approach , 2013, Artif. Intell. Medicine.
[140] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[141] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[142] Sergey Levine,et al. Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.
[143] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[144] Christoph Salge,et al. Changing the Environment Based on Empowerment as Intrinsic Motivation , 2014, Entropy.
[145] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[146] Ronald Ortner,et al. Selecting Near-Optimal Approximate State Representations in Reinforcement Learning , 2014, ALT.
[147] Matthew E. Taylor,et al. Multi-objectivization of reinforcement learning problems by reward shaping , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).
[148] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.
[149] Giles W. Story,et al. Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective , 2014, Front. Behav. Neurosci..
[150] Nan Jiang,et al. Abstraction Selection in Model-based Reinforcement Learning , 2015, ICML.
[151] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..
[152] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[153] Yoshua Bengio,et al. Towards Biologically Plausible Deep Learning , 2015, ArXiv.
[154] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[155] Thomas B. Schön,et al. From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.
[156] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[157] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[158] D. Curran‐Everett,et al. The fickle P value generates irreproducible results , 2015, Nature Methods.
[159] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[160] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[161] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[162] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[163] Dirk Lindebaum. Sapiens: A Brief History of Humankind - A Review , 2015 .
[164] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[165] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.
[166] Damien Ernst,et al. How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies , 2015, ArXiv.
[167] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[168] Jianfeng Gao,et al. Recurrent Reinforcement Learning: A Hybrid Approach , 2015, ArXiv.
[169] Zoran Popovic,et al. Interactive Control of Diverse Complex Characters with Neural Networks , 2015, NIPS.
[170] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[171] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[172] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[173] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[174] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[175] Alborz Geramifard,et al. RLPy: a value-function-based reinforcement learning framework for education and research , 2015, J. Mach. Learn. Res..
[176] V. Upadhyay. Capital in the Twenty-First Century , 2015 .
[177] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[178] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[179] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.
[180] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[181] Tianqi Chen,et al. Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.
[182] Yann LeCun,et al. Deep multi-scale video prediction beyond mean square error , 2015, ICLR.
[183] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.
[184] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.
[185] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.
[186] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[187] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[188] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[189] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.
[190] Murray Shanahan,et al. Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.
[191] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[192] Zachary Chase Lipton,et al. Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking , 2016 .
[193] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[194] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[195] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[196] Alex Graves,et al. Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.
[197] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[198] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.
[199] Honglak Lee,et al. Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.
[200] Sergey Levine,et al. Adapting Deep Visuomotor Representations with Weak Pairwise Constraints , 2015, WAFR.
[201] M. Baker. 1,500 scientists lift the lid on reproducibility , 2016, Nature.
[202] Damien Ernst,et al. Deep Reinforcement Learning Solutions for Energy Microgrids Management , 2016 .
[203] Jürgen Schmidhuber,et al. A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.
[204] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[205] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[206] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[207] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[208] Peter Stone,et al. Source Task Creation for Curriculum Learning , 2016, AAMAS.
[209] Marc G. Bellemare,et al. Q($\lambda$) with Off-Policy Corrections , 2016 .
[210] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[211] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[212] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).
[213] Shane Legg,et al. DeepMind Lab , 2016, ArXiv.
[214] Nando de Freitas,et al. Neural Programmer-Interpreters , 2015, ICLR.
[215] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[216] Katja Hofmann,et al. The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.
[217] Quoc V. Le,et al. Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.
[218] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[219] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[220] Li Li,et al. Traffic signal timing via deep reinforcement learning , 2016, IEEE/CAA Journal of Automatica Sinica.
[221] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[222] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.
[223] Florian Richoux,et al. TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games , 2016, ArXiv.
[224] Alejandro Hernández Cordero,et al. Extending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS and Gazebo , 2016, ArXiv.
[225] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[226] Julian Togelius,et al. Ieee Transactions on Computational Intelligence and Ai in Games the 2014 General Video Game Playing Competition , 2022 .
[227] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[228] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[229] Shie Mannor,et al. Adaptive Skills Adaptive Partitions (ASAP) , 2016, NIPS.
[230] Sergey Levine,et al. Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.
[231] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[232] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[233] Razvan Pascanu,et al. A simple neural network module for relational reasoning , 2017, NIPS.
[234] Marc G. Bellemare,et al. The Reactor: A Sample-Efficient Actor-Critic Architecture , 2017, ArXiv.
[235] Joelle Pineau,et al. An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.
[236] Tuomas Sandholm,et al. Libratus: The Superhuman AI for No-Limit Poker , 2017, IJCAI.
[237] Damien Ernst,et al. Approximate Bayes Optimal Policy Search using Neural Networks , 2017, ICAART.
[238] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[239] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[240] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[241] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[242] Louis Wehenkel,et al. Machine learning of real-time power systems reliability management response , 2017, 2017 IEEE Manchester PowerTech.
[243] Li Fei-Fei,et al. Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[244] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[245] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.
[246] Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.
[247] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
[248] Learning Robust Dialog Policies in Noisy Environments , 2017, ArXiv.
[249] Chris Sauer,et al. Beating Atari with Natural Language Guided Reinforcement Learning , 2017, ArXiv.
[250] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[251] Vladlen Koltun,et al. Learning to Act by Predicting the Future , 2016, ICLR.
[252] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[253] D. Hassabis,et al. Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.
[254] Razvan Pascanu,et al. Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.
[255] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[256] Yoshua Bengio. The Consciousness Prior , 2017, ArXiv.
[257] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[258] Abhinav Gupta,et al. Learning to fly by crashing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[259] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[260] Yuandong Tian,et al. ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.
[261] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[262] Marcin Andrychowicz,et al. One-Shot Imitation Learning , 2017, NIPS.
[263] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[264] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[265] Tom Schaul,et al. StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.
[266] Guan Wang,et al. Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.
[267] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[268] Vincent François-Lavet,et al. Contributions to deep reinforcement learning and its applications in smartgrids , 2017 .
[269] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[270] Sebastian Ruder,et al. An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.
[271] Dawn Xiaodong Song,et al. Learning Neural Programs To Parse Programs , 2017, ArXiv.
[272] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.
[273] Cewu Lu,et al. Virtual to Real Reinforcement Learning for Autonomous Driving , 2017, BMVC.
[274] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[275] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[276] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[277] Emma Brunskill,et al. Sample Efficient Feature Selection for Factored MDPs , 2017, ArXiv.
[278] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[279] Dileep George,et al. Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.
[280] Samuel J. Gershman,et al. Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017 .
[281] Jun Wang,et al. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.
[282] Glen Berseth,et al. DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning , 2017, ACM Trans. Graph..
[283] Christopher Burgess,et al. DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.
[284] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
[285] Tsuneo Kato,et al. “Re:ROS”: Prototyping of Reinforcement Learning Environment for Asynchronous Cognitive Architecture , 2017, BICA 2017.
[286] Yuandong Tian,et al. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.
[287] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.
[288] Alex Graves,et al. Video Pixel Networks , 2016, ICML.
[289] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[290] Daan Wierstra,et al. Recurrent Environment Simulators , 2017, ICLR.
[291] Razvan Pascanu,et al. Learning model-based planning from scratch , 2017, ArXiv.
[292] A. P. Hyper-parameters. Count-Based Exploration with Neural Density Models , 2017 .
[293] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[294] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[295] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[296] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.
[297] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[298] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.
[299] Gregory Dudek,et al. Benchmark Environments for Multitask Learning in Continuous Domains , 2017, ArXiv.
[300] Samy Bengio,et al. Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.
[301] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[302] Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning , 2017, ICLR 2018.
[303] Peter Henderson,et al. Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.
[304] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[305] Guillaume Lample,et al. Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.
[306] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[307] Quoc V. Le,et al. Large-Scale Evolution of Image Classifiers , 2017, ICML.
[308] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.
[309] Youyong Kong,et al. Deep Direct Reinforcement Learning for Financial Signal Representation and Trading , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[310] M. Marinelli. Dopamine , 2018, Reactions Weekly.
[311] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[312] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[313] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.
[314] Daniel L. K. Yamins,et al. Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.
[315] Marwan Mattar,et al. Unity: A General Platform for Intelligent Agents , 2018, ArXiv.
[316] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[317] Nando de Freitas,et al. Intrinsic Social Motivation via Causal Influence in Multi-Agent RL , 2018, ArXiv.
[318] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[319] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[320] Samy Bengio,et al. A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.
[321] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.
[322] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[323] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.
[324] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.
[325] Sergio Gomez Colmenarejo,et al. One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL , 2018, ArXiv.
[326] Sergey Levine,et al. Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[327] Xiaohui Ye,et al. Horizon: Facebook's Open Source Applied Reinforcement Learning Platform , 2018, ArXiv.
[328] Joel Z. Leibo,et al. Human-level performance in first-person multiplayer games with population-based deep reinforcement learning , 2018, ArXiv.
[329] Hyrum S. Anderson,et al. The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation , 2018, ArXiv.
[330] Joelle Pineau,et al. A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.
[331] Yee Whye Teh,et al. An Analysis of Categorical Distributional Reinforcement Learning , 2018, AISTATS.
[332] Peter Stone,et al. Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces , 2017, AAAI.
[333] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.
[334] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[335] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[336] Marcin Andrychowicz,et al. Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.
[337] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[338] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.
[339] Joelle Pineau,et al. Decoupling Dynamics and Reward for Transfer Learning , 2018, ICLR.
[340] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[341] R. Kilgour,et al. Deer , 2019, Livestock Behaviour.
[342] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[343] Wojciech Czarnecki,et al. Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.
[344] Yevgen Chebotar,et al. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[345] Elliot Meyerson,et al. Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.
[346] Joelle Pineau,et al. Combined Reinforcement Learning via Abstract Representations , 2018, AAAI.
[347] Damien Ernst,et al. On overfitting and asymptotic bias in batch reinforcement learning with partial observability , 2017, J. Artif. Intell. Res..
[348] Nando de Freitas,et al. Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.
[349] Yuxi Li,et al. Deep Reinforcement Learning , 2018, Reinforcement Learning for Cyber-Physical Systems.
[350] Marc Pollefeys,et al. Episodic Curiosity through Reachability , 2018, ICLR.
[351] M. Eaton. Superintelligence , 2020, Computers, People, and Thought.
[352] John Schulman,et al. Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.