论文信息 - Deep Reinforcement Learning

Deep Reinforcement Learning

We discuss deep reinforcement learning in an overview style. We draw a big picture, filled with details. We discuss six core elements, six important mechanisms, and twelve applications, focusing on contemporary work, and in historical contexts. We start with background of artificial intelligence, machine learning, deep learning, and reinforcement learning (RL), with resources. Next we discuss RL core elements, including value function, policy, reward, model, exploration vs. exploitation, and representation. Then we discuss important mechanisms for RL, including attention and memory, unsupervised learning, hierarchical RL, multi-agent RL, relational RL, and learning to learn. After that, we discuss RL applications, including games, robotics, natural language processing (NLP), computer vision, finance, business management, healthcare, education, energy, transportation, computer systems, and, science, engineering, and art. Finally we summarize briefly, discuss challenges and opportunities, and close with an epilogue.

Yuxi Li | Yuxi Li

[1] Henry C. Ellis,et al. Transfer of Learning , 2021, Research in Mathematics Education.

[2] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[3] J. Murphy. Technical Analysis of the Futures Markets: A Comprehensive Guide to Trading Methods and Applications , 1986 .

[4] J. Hull. Options, Futures, and Other Derivatives , 1989 .

[5] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[6] Raymond Kurzweil,et al. Age of intelligent machines , 1990 .

[7] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[8] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[9] Richard S. Sutton,et al. Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.

[10] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[11] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[12] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[13] Ming Tan,et al. Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[14] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[15] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[16] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[17] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[18] Deborah Silver,et al. Feature Visualization , 1994, Scientific Visualization.

[19] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[20] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[21] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[22] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[23] Ralph Neuneier,et al. Enhancing Q-Learning for Optimal Asset Allocation , 1997, NIPS.

[24] Jonathan Schaeffer. One Jump Ahead , 1997 .

[25] T. Crystal. Conversational speech recognition , 1997 .

[26] Randy Goebel,et al. Computational intelligence - a logical approach , 1998 .

[27] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[28] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[29] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[30] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[31] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[32] Manuela M. Veloso,et al. Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[33] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[34] Andrew W. Lo,et al. Foundations of Technical Analysis: Computational Algorithms, Statistical Inference, and Empirical Implementation , 2000 .

[35] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[36] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.

[37] John N. Tsitsiklis,et al. Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.

[38] Francis A. Longstaff,et al. Valuing American Options by Simulation: A Simple Least-Squares Approach , 2001 .

[39] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.

[40] Matthew Saffell,et al. Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[41] Matthew L. Ginsberg,et al. GIB: Imperfect Information in a Computationally Challenging Game , 2011, J. Artif. Intell. Res..

[42] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[43] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[44] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[45] Luis M. Viceira,et al. Appendix for "Strategic Asset Allocation: Portfolio Choice for Long-Term Investors" , 2001 .

[46] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[47] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[48] Murray Campbell,et al. Deep Blue , 2002, Artif. Intell..

[49] Bernhard Schölkopf,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[50] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[51] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[52] Carlos Guestrin,et al. Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[53] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[54] Paul Glasserman,et al. Monte Carlo Methods in Financial Engineering , 2003 .

[55] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[56] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[57] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[58] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[59] Rich Caruana,et al. Multitask Learning , 1997, Machine Learning.

[60] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[61] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[62] Michael R. James,et al. Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[63] Hector J. Levesque,et al. Knowledge Representation and Reasoning , 2004 .

[64] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[65] Victor R. Lesser,et al. A survey of multi-agent organizational paradigms , 2004, The Knowledge Engineering Review.

[66] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[67] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.

[68] A. Lo. The Adaptive Markets Hypothesis , 2004 .

[69] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[70] Robert Givan,et al. Relational Reinforcement Learning: An Overview , 2004, ICML 2004.

[71] Richard S. Sutton,et al. Temporal-Difference Networks , 2004, NIPS.

[72] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[73] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[74] Simon Haykin,et al. Cognitive radio: brain-empowered wireless communications , 2005, IEEE Journal on Selected Areas in Communications.

[75] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[76] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[77] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[78] Christos Dimitrakakis,et al. TORCS, The Open Racing Car Simulator , 2005 .

[79] Richard S. Sutton,et al. Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.

[80] Xiaotie Deng,et al. Settling the Complexity of Two-Player Nash Equilibrium , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[81] S. Schaal. Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[82] Rich Caruana,et al. Model compression , 2006, KDD '06.

[83] Toby Walsh,et al. Handbook of Constraint Programming , 2006, Handbook of Constraint Programming.

[84] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[85] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[86] Sridhar Mahadevan,et al. Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[87] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[88] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.

[89] Yoav Shoham,et al. If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[90] Ben Taskar,et al. Introduction to statistical relational learning , 2007 .

[91] Robert E. Schapire,et al. A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[92] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[93] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[94] David Thue,et al. Interactive Storytelling: A Player Modelling Approach , 2007, AIIDE.

[95] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[96] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[97] Jonathan Schaeffer,et al. Checkers Is Solved , 2007, Science.

[98] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML '08.

[99] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[100] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[101] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.

[102] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[103] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[104] Michael H. Bowling,et al. Apprenticeship learning using linear programming , 2008, ICML '08.

[105] Yoav Shoham,et al. Essentials of Game Theory: A Concise Multidisciplinary Introduction , 2008, Essentials of Game Theory: A Concise Multidisciplinary Introduction.

[106] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[107] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[108] Xiaojin Zhu,et al. Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[109] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.

[110] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[111] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[112] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[113] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[114] Dale Schuurmans,et al. Learning Exercise Policies for American Options , 2009, AISTATS.

[115] VARUN CHANDOLA,et al. Anomaly detection: A survey , 2009, CSUR.

[116] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[117] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[118] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .

[119] Yaoliang Yu,et al. A General Projection Property for Distribution Families , 2009, NIPS.

[120] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[121] Ricardo Vilalta,et al. Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[122] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[123] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .

[124] Masashi Sugiyama,et al. Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.

[125] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[126] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[127] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .

[128] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.

[129] Vern Paxson,et al. Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[130] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.

[131] Joelle Pineau,et al. Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.

[132] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[133] Anind K. Dey,et al. Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[134] A. Lo,et al. Consumer Credit Risk Models Via Machine-Learning Algorithms , 2010 .

[135] Warren B. Powell,et al. Feature Article - Merging AI and OR to Solve High-Dimensional Stochastic Optimization Problems Using Approximate Dynamic Programming , 2010, INFORMS J. Comput..

[136] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[137] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[138] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[139] Warren B. Powell,et al. Adaptive Stochastic Control for the Smart Grid , 2011, Proceedings of the IEEE.

[140] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[141] Warren B. Powell,et al. “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[142] Regina Barzilay,et al. Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.

[143] Jeffrey Pennington,et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[144] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.

[145] Lihong Li,et al. Sample Complexity Bounds of Exploration , 2012, Reinforcement Learning.

[146] Pedro M. Domingos. A few useful things to know about machine learning , 2012, Commun. ACM.

[147] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[148] D. Yen,et al. Identifying the signs of fraudulent accounts using data mining techniques , 2012, Comput. Hum. Behav..

[149] Xi Fang,et al. 3. Full Four-channel 6.3-gb/s 60-ghz Cmos Transceiver with Low-power Analog and Digital Baseband Circuitry 7. Smart Grid — the New and Improved Power Grid: a Survey , 2022 .

[150] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.

[151] M. Kosorok,et al. Q-LEARNING WITH CENSORED DATA. , 2012, Annals of statistics.

[152] Michèle Sebag,et al. The grand challenge of computer Go , 2012, Commun. ACM.

[153] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[154] Masashi Sugiyama,et al. Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting , 2012, ICML.

[155] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[156] Michael H. Bowling,et al. Tractable Objectives for Robust Policy Optimization , 2012, NIPS.

[157] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[158] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[159] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[160] Santiago Ontañón,et al. A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[161] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[162] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[163] David Silver,et al. Concurrent Reinforcement Learning from Customer Interactions , 2013, ICML.

[164] Liljana Gavrilovska,et al. Learning and Reasoning in Cognitive Radio Networks , 2013, IEEE Communications Surveys & Tutorials.

[165] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[166] Baher Abdulhai,et al. Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto , 2013, IEEE Transactions on Intelligent Transportation Systems.

[167] Milica Gasic,et al. POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[168] Joelle Pineau,et al. Learning from Limited Demonstrations , 2013, NIPS.

[169] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.

[170] Li Deng,et al. Speech-Centric Information Processing: An Optimization-Oriented Approach , 2013, Proceedings of the IEEE.

[171] Daniela M. Witten,et al. An Introduction to Statistical Learning: with Applications in R , 2013 .

[172] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[173] Xiao Li,et al. Machine Learning Paradigms for Speech Recognition: An Overview , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[174] Max Kuhn,et al. Applied Predictive Modeling , 2013 .

[175] Shih-Chieh Huang,et al. MoHex 2.0: A Pattern-Based MCTS Hex Player , 2013, Computers and Games.

[176] Léon Bottou,et al. From machine learning to machine reasoning , 2011, Machine Learning.

[177] Tom Fawcett,et al. Data science for business , 2013 .

[178] Andrew G. Barto,et al. Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[179] Suchi Saria,et al. A $3 Trillion Challenge to Computational Scientists: Transforming Healthcare Delivery , 2014, IEEE Intelligent Systems.

[180] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[181] Shalabh Bhatnagar,et al. Universal Option Models , 2014, NIPS.

[182] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[183] Sergey Levine,et al. Learning Complex Neural Network Policies with Trajectory Optimization , 2014, ICML.

[184] Peter Dayan,et al. Bayes-Adaptive Simulation-based Search with Value Function Approximation , 2014, NIPS.

[185] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.

[186] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[187] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[188] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[189] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[190] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[191] Wu He,et al. Internet of Things in Industries: A Survey , 2014, IEEE Transactions on Industrial Informatics.

[192] Hwee Pink Tan,et al. Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications , 2014, IEEE Communications Surveys & Tutorials.

[193] S. Murphy,et al. Dynamic Treatment Regimes. , 2014, Annual review of statistics and its application.

[194] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[195] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[196] Zoran Popovic,et al. Trading Off Scientific Knowledge and User Learning with Multi-Armed Bandits , 2014, EDM.

[197] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[198] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[199] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.

[200] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[201] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[202] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..

[203] Zheng Wen,et al. Optimal Demand Response Using Device-Based Reinforcement Learning , 2014, IEEE Transactions on Smart Grid.

[204] Philip S. Thomas,et al. Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees , 2015, IJCAI.

[205] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[206] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.

[207] Christopher D. Manning,et al. Advances in natural language processing , 2015, Science.

[208] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[209] Svetlana Lazebnik,et al. Active Object Localization with Deep Reinforcement Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[210] Alessandro Lazaric,et al. Maximum Entropy Semi-Supervised Inverse Reinforcement Learning , 2015, IJCAI.

[211] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[212] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[213] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[214] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.

[215] Kyunghyun Cho,et al. Natural Language Understanding with Distributed Representation , 2015, ArXiv.

[216] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[217] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[218] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[219] Joshua B. Tenenbaum,et al. Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[220] Jason Weston,et al. Memory Networks , 2014, ICLR.

[221] Michael I. Jordan,et al. Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[222] Jiajun Wu,et al. Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[223] Koray Kavukcuoglu,et al. Multiple Object Recognition with Visual Attention , 2014, ICLR.

[224] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[225] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[226] Dianhai Yu,et al. Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[227] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.

[228] Bolei Zhou,et al. Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[229] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[230] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[231] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[232] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[233] Michael L. Littman,et al. Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[234] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[235] Regina Barzilay,et al. Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[236] Luís Paulo Reis,et al. Model-Based Relative Entropy Stochastic Search , 2016, NIPS.

[237] Michael I. Jordan,et al. Machine learning: Trends, perspectives, and prospects , 2015, Science.

[238] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[239] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[240] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[241] Ross A. Knepper,et al. DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[242] Ivan Laptev,et al. Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[243] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[244] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[245] M. Kosorok,et al. Adaptive Treatment Strategies in Practice: Planning Trials and Analyzing Data for Personalized Medicine , 2015 .

[246] Pedro M. Domingos,et al. The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World , 2015 .

[247] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[248] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[249] Michal Valko,et al. Bayesian Policy Gradient and Actor-Critic Algorithms , 2016, J. Mach. Learn. Res..

[250] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[251] Yuandong Tian,et al. Better Computer Go Player with Neural Network and Long-term Prediction , 2016, ICLR.

[252] George Saon,et al. The IBM 2016 English Conversational Telephone Speech Recognition System , 2016, INTERSPEECH.

[253] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[254] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[255] Stefano Ermon,et al. Model-Free Imitation Learning with Policy Optimization , 2016, ICML.

[256] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.

[257] Samuel Gershman,et al. Deep Successor Reinforcement Learning , 2016, ArXiv.

[258] Jianfeng Gao,et al. Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[259] Christopher D. Manning,et al. Learning Language Games through Interaction , 2016, ACL.

[260] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[261] Philip H. S. Torr,et al. Playing Doom with SLAM-Augmented Deep Reinforcement Learning , 2016, ArXiv.

[262] Srikanth Kandula,et al. Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[263] Michael C. Fu,et al. Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control , 2015, ICML.

[264] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[265] Kavosh Asadi,et al. A New Softmax Operator for Reinforcement Learning , 2016, ArXiv.

[266] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[267] Tie-Yan Liu,et al. Dual Learning for Machine Translation , 2016, NIPS.

[268] David Pfau,et al. Connecting Generative Adversarial Networks and Actor-Critic Methods , 2016, ArXiv.

[269] Geoffrey E. Hinton,et al. Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[270] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[271] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[272] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.

[273] Ian J. Goodfellow,et al. Technical Report on the CleverHans v2.1.0 Adversarial Examples Library , 2016 .

[274] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[275] Le Song,et al. Discriminative Embeddings of Latent Variable Models for Structured Data , 2016, ICML.

[276] Cristian Sminchisescu,et al. Reinforcement Learning for Visual Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[277] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[278] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[279] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[280] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[281] Marlos C. Machado,et al. State of the Art Control of Atari Games Using Shallow Reinforcement Learning , 2015, AAMAS.

[282] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.

[283] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[284] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[285] Alex Graves,et al. Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[286] Murat Kantarcioglu,et al. Adversarial Data Mining: Big Data Meets Cyber Security , 2016, CCS.

[287] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[288] Tor Lattimore,et al. Causal Bandits: Learning Good Interventions via Causal Inference , 2016, NIPS.

[289] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[290] Honglak Lee,et al. Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[291] Shan Carter,et al. Attention and Augmented Recurrent Neural Networks , 2016 .

[292] Maosong Sun,et al. Semi-Supervised Learning for Neural Machine Translation , 2016, ACL.

[293] Justin A. Sirignano. Deep learning for limit order books , 2016, Quantitative Finance.

[294] Jianfeng Gao,et al. Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads , 2016, EMNLP.

[295] Jianfeng Gao,et al. Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.

[296] Jing He,et al. Policy Networks with Two-Stage Training for Dialogue Systems , 2016, SIGDIAL Conference.

[297] Masayoshi Tomizuka,et al. Algorithmic safety measures for intelligent industrial co-robots , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[298] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[299] Marina Krakovsky. Reinforcement renaissance , 2016, Commun. ACM.

[300] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[301] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[302] Samy Bengio,et al. Can Active Memory Replace Attention? , 2016, NIPS.

[303] Uri Shalit,et al. Learning Representations for Counterfactual Inference , 2016, ICML.

[304] Philip Bachman,et al. Natural Language Comprehension with the EpiReader , 2016, EMNLP.

[305] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[306] David Vandyke,et al. On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.

[307] Alexander M. Rush,et al. Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[308] Nikolaus Hansen,et al. The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[309] Shuicheng Yan,et al. Tree-Structured Reinforcement Learning for Sequential Object Localization , 2016, NIPS.

[310] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.

[311] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[312] Abhinav Gupta. Supersizing Self-Supervision: Learning Perception and Action Without Human Supervision , 2016 .

[313] Jitendra Malik,et al. Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[314] Taghi M. Khoshgoftaar,et al. A survey of transfer learning , 2016, Journal of Big Data.

[315] Yu Zhang,et al. Personalizing a Dialogue System with Transfer Learning , 2016, ArXiv.

[316] Kyunghyun Cho,et al. End-to-End Goal-Driven Web Navigation , 2016, NIPS.

[317] David Silver,et al. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[318] Md. Mustafizur Rahman,et al. Neural Information Retrieval: A Literature Review , 2016, ArXiv.

[319] Regina Barzilay,et al. Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning , 2016, EMNLP.

[320] Razvan Pascanu,et al. Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[321] Peter Stone,et al. Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[322] Michael I. Jordan,et al. Unsupervised Domain Adaptation with Residual Transfer Networks , 2016, NIPS.

[323] Nando de Freitas,et al. Neural Programmer-Interpreters , 2015, ICLR.

[324] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[325] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[326] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[327] Yelong Shen,et al. ReasoNet: Learning to Stop Reading in Machine Comprehension , 2016, CoCo@NIPS.

[328] Geoffrey E. Hinton,et al. Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[329] Jim Duggan,et al. An Experimental Review of Reinforcement Learning Algorithms for Adaptive Traffic Signal Control , 2016, Autonomic Road Transport Support Systems.

[330] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[331] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[332] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[333] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..

[334] Alex Graves,et al. Associative Long Short-Term Memory , 2016, ICML.

[335] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[336] Nicolas Usunier,et al. Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[337] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[338] Gökhan Tür,et al. End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding , 2016, INTERSPEECH.

[339] Ying Zhang,et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks , 2016, INTERSPEECH.

[340] Marek Petrik,et al. Proximal Gradient Temporal Difference Learning Algorithms , 2016, IJCAI.

[341] J. Pearl,et al. Causal inference in statistics , 2016 .

[342] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[343] Maxine Eskénazi,et al. Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning , 2016, SIGDIAL Conference.

[344] Rudolf Kadlec,et al. Text Understanding with the Attention Sum Reader Network , 2016, ACL.

[345] Josef Urban,et al. DeepMath - Deep Sequence Models for Premise Selection , 2016, NIPS.

[346] Martha White,et al. Investigating Practical Linear Temporal Difference Learning , 2016, AAMAS.

[347] Jing He,et al. A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems , 2016, INTERSPEECH.

[348] Razvan Pascanu,et al. A simple neural network module for relational reasoning , 2017, NIPS.

[349] Sergey Levine,et al. Collective robot reinforcement learning with distributed asynchronous guided policy search , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[350] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[351] David Vandyke,et al. A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[352] Joelle Pineau,et al. An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[353] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[354] Kirthevasan Kandasamy,et al. Batch Policy Gradient Methods for Improving Neural Conversation Models , 2017, ICLR.

[355] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[356] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.

[357] Lei Zhang,et al. Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[358] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[359] Matthew R. G. Brown,et al. Learning stable and predictive network-based patterns of schizophrenia and its clinical symptoms , 2017, npj Schizophrenia.

[360] Siqi Liu,et al. Improved Image Captioning via Policy Gradient optimization of SPIDEr , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[361] Quoc V. Le,et al. Neural Optimizer Search with Reinforcement Learning , 2017, ICML.

[362] Zhao Chen,et al. The Game Imitation: Deep Supervised Convolutional Networks for Quick Video Game AI , 2017, ArXiv.

[363] Liang Lin,et al. Attention-Aware Face Hallucination via Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[364] Wolfram Burgard,et al. Deep reinforcement learning with successor features for navigation across similar environments , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[365] Percy Liang,et al. From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood , 2017, ACL.

[366] Philip S. Yu,et al. Learning Multiple Tasks with Multilinear Relationship Networks , 2015, NIPS.

[367] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[368] Peter Stone,et al. Intrinsically motivated model learning for developing curious robots , 2017, Artif. Intell..

[369] Oliver Brock,et al. Interactive Perception: Leveraging Action in Perception and Perception in Action , 2016, IEEE Transactions on Robotics.

[370] Bhaskara Marthi,et al. A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs , 2017, Science.

[371] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[372] Razvan Pascanu,et al. Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[373] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[374] Ji Feng,et al. Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[375] Jason Weston,et al. Learning through Dialogue Interactions by Asking Questions , 2016, ICLR.

[376] Ameet Talwalkar,et al. Federated Multi-Task Learning , 2017, NIPS.

[377] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[378] Lantao Yu,et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[379] Balaraman Ravindran,et al. Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning , 2017, ICLR.

[380] Graham Neubig,et al. Neural Machine Translation and Sequence-to-sequence Models: A Tutorial , 2017, ArXiv.

[381] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[382] José M. F. Moura,et al. Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[383] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.

[384] Pieter Abbeel,et al. Third-Person Imitation Learning , 2017, ICLR.

[385] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[386] Zachary C. Lipton,et al. Improving Factor-Based Quantitative Investing by Forecasting Company Fundamentals , 2017, ArXiv.

[387] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.

[388] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[389] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[390] Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[391] Fan Yang,et al. Good Semi-supervised Learning That Requires a Bad GAN , 2017, NIPS.

[392] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[393] Nicholas Rhinehart,et al. First-Person Activity Forecasting with Online Inverse Reinforcement Learning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[394] Misha Denil,et al. Learning to Perform Physics Experiments via Deep Reinforcement Learning , 2016, ICLR.

[395] Stefano Ermon,et al. InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[396] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[397] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[398] Zheng Zhang,et al. Saliency-based Sequential Image Attention with Multiset Prediction , 2017, NIPS.

[399] Bart De Schutter,et al. Residential Demand Response of Thermostatically Controlled Loads Using Batch Reinforcement Learning , 2017, IEEE Transactions on Smart Grid.

[400] Sergey Levine,et al. Generalizing Skills with Semi-Supervised Reinforcement Learning , 2016, ICLR.

[401] Stephen Tyree,et al. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[402] Vladlen Koltun,et al. Learning to Act by Predicting the Future , 2016, ICLR.

[403] Chen Liang,et al. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision , 2016, ACL.

[404] Lihong Li,et al. Stochastic Variance Reduction Methods for Policy Evaluation , 2017, ICML.

[405] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[406] Razvan Pascanu,et al. Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[407] Tom Schaul,et al. Building Machines that Learn and Think for Themselves: Commentary on Lake et al., Behavioral and Brain Sciences, 2017 , 2017, 1711.08378.

[408] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[409] Ramakanth Pasunuru,et al. Reinforced Video Captioning with Entailment Rewards , 2017, EMNLP.

[410] Yuandong Tian,et al. ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[411] Yedid Hoshen,et al. VAIN: Attentional Multi-agent Predictive Modeling , 2017, NIPS.

[412] Tim Rocktäschel,et al. End-to-end Differentiable Proving , 2017, NIPS.

[413] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.

[414] Wang Ling,et al. Learning to Compose Words into Sentences with Reinforcement Learning , 2016, ICLR.

[415] Byron Boots,et al. Predictive-State Decoders: Encoding the Future into Recurrent Networks , 2017, NIPS.

[416] Jitendra Malik,et al. Learning to Optimize Neural Nets , 2017, ArXiv.

[417] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[418] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[419] Marcin Andrychowicz,et al. One-Shot Imitation Learning , 2017, NIPS.

[420] Tom Schaul,et al. StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[421] Lihong Li,et al. Neuro-Symbolic Program Synthesis , 2016, ICLR.

[422] Jianfeng Gao,et al. End-to-End Task-Completion Neural Dialogue Systems , 2017, IJCNLP.

[423] Gang Hua,et al. Collaborative Deep Reinforcement Learning for Joint Object Search , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[424] Marijn F. Stollenga,et al. Continual curiosity-driven skill acquisition from high-dimensional video inputs for humanoid robots , 2017, Artif. Intell..

[425] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.

[426] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[427] Tomas Pfister,et al. Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[428] Aurko Roy,et al. Learning to Remember Rare Events , 2017, ICLR.

[429] Yann Dauphin,et al. Deal or No Deal? End-to-End Learning of Negotiation Dialogues , 2017, EMNLP.

[430] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[431] Damien Ernst,et al. Reinforcement Learning for Electric Power System Decision and Control: Past Considerations and Perspectives , 2017 .

[432] Ramesh Raskar,et al. Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[433] Sebastian Ruder,et al. An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[434] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.

[435] Qinru Qiu,et al. A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[436] Nan Jiang,et al. Repeated Inverse Reinforcement Learning , 2017, NIPS.

[437] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.

[438] Raymond Y. K. Lau,et al. Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[439] Peng Peng,et al. Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[440] A. Ng,et al. MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs. , 2017 .

[441] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[442] Tom M. Mitchell,et al. Leveraging Knowledge Bases in LSTMs for Improving Machine Reading , 2017, ACL.

[443] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[444] Junjie Yan,et al. Practical Network Blocks Design with Q-Learning , 2017, ArXiv.

[445] Eric P. Xing,et al. Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[446] Don Monroe. Deep learning takes on translation , 2017, Commun. ACM.

[447] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[448] Justin Fu,et al. EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.

[449] Wenhan Xiong,et al. DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning , 2017, EMNLP.

[450] Yang Liu,et al. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.

[451] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[452] Sebastian Nowozin,et al. DeepCoder: Learning to Write Programs , 2016, ICLR.

[453] David Sontag,et al. Learning a Health Knowledge Graph from Electronic Medical Records , 2017, Scientific Reports.

[454] Balaraman Ravindran,et al. Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain , 2015, ICLR.

[455] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[456] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[457] Gabriel Synnaeve,et al. STARDATA: A StarCraft AI Research Dataset , 2017, AIIDE.

[458] Kyunghyun Cho,et al. Task-Oriented Query Reformulation with Reinforcement Learning , 2017, EMNLP.

[459] Dileep George,et al. Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[460] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.

[461] Jin Young Choi,et al. Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[462] Oladimeji Farri,et al. Diagnostic Inferencing via Improving Clinical Concept Extraction with Deep Reinforcement Learning: A Preliminary Study , 2017, MLHC.

[463] Lukasz Kaiser,et al. One Model To Learn Them All , 2017, ArXiv.

[464] Tuomas Sandholm,et al. Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[465] Geoffrey Zweig,et al. Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning , 2017, ACL.

[466] Byron Boots,et al. Predictive State Recurrent Neural Networks , 2017, NIPS.

[467] Kai-Uwe Kühnberger,et al. Neural-Symbolic Learning and Reasoning: A Survey and Interpretation , 2017, Neuro-Symbolic Artificial Intelligence.

[468] Yuxi Li,et al. Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[469] Richard Socher,et al. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[470] Luke S. Zettlemoyer,et al. Deep Semantic Role Labeling: What Works and What’s Next , 2017, ACL.

[471] Lawrence D. Jackel,et al. Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car , 2017, ArXiv.

[472] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[473] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[474] Deva Ramanan,et al. Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[475] David Berthelot,et al. BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.

[476] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[477] Bhaskar Mitra,et al. Neural Models for Information Retrieval , 2017, ArXiv.

[478] Alexander Knapp,et al. Transferring Context-Dependent Test Inputs , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[479] Geoffrey Zweig,et al. The microsoft 2016 conversational speech recognition system , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[480] Jacob biamonte,et al. Quantum machine learning , 2016, Nature.

[481] Yuandong Tian,et al. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.

[482] Jonathan P. How,et al. Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[483] Masayoshi Tomizuka,et al. Designing the Robot Behavior for Safe Human–Robot Interactions , 2017 .

[484] Ming Zhou,et al. Gated Self-Matching Networks for Reading Comprehension and Question Answering , 2017, ACL.

[485] Misha Denil,et al. Learned Optimizers that Scale and Generalize , 2017, ICML.

[486] Sergey Levine,et al. Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[487] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[488] Rachid Guerraoui,et al. Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning , 2017, NIPS.

[489] Richard Socher,et al. Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[490] Peng Zhang,et al. IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models , 2017, SIGIR.

[491] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.

[492] Tom M. Mitchell,et al. What can machine learning do? Workforce implications , 2017, Science.

[493] Stefan Lee,et al. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[494] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[495] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[496] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[497] Alexei A. Efros,et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[498] Geoffrey E. Hinton,et al. Dynamic Routing Between Capsules , 2017, NIPS.

[499] Ning Zhang,et al. Deep Reinforcement Learning-Based Image Captioning with Embedding Reward , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[500] Kam-Fai Wong,et al. Composite Task-Completion Dialogue System via Hierarchical Deep Reinforcement Learning , 2017, ArXiv.

[501] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[502] Sandy H. Huang,et al. Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[503] Padhraic Smyth,et al. Science and data science , 2017, Proceedings of the National Academy of Sciences.

[504] Nando de Freitas,et al. Robust Imitation of Diverse Behaviors , 2017, NIPS.

[505] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.

[506] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.

[507] Marc G. Bellemare,et al. The Cramer Distance as a Solution to Biased Wasserstein Gradients , 2017, ArXiv.

[508] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.

[509] Yuan Li,et al. Learning how to Active Learn: A Deep Reinforcement Learning Approach , 2017, EMNLP.

[510] Dawn Song,et al. Robust Physical-World Attacks on Deep Learning Models , 2017, 1707.08945.

[511] Dirk Ormoneit,et al. Kernel-Based Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.

[512] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.

[513] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[514] Ser-Nam Lim,et al. A Reinforcement Learning Approach to the View Planning Problem , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[515] Samy Bengio,et al. Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[516] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[517] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[518] Yann LeCun,et al. Model-Based Planning in Discrete Action Spaces , 2017, ArXiv.

[519] Randy H. Katz,et al. A Berkeley View of Systems Challenges for AI , 2017, ArXiv.

[520] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[521] Joel Z. Leibo,et al. A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[522] Bernhard Schölkopf,et al. Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[523] Guillaume Lample,et al. Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[524] Olivier Pietquin,et al. End-to-end optimization of goal-driven and visually grounded dialogue systems , 2017, IJCAI.

[525] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[526] Jiajun Wu,et al. Neural Scene De-rendering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[527] Yuval Tassa,et al. Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[528] Martín Abadi,et al. Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[529] Cezary Kaliszyk,et al. Deep Network Guided Proof Search , 2017, LPAR.

[530] Richard Socher,et al. Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[531] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[532] Jitendra Malik,et al. Learning to Optimize , 2016, ICLR.

[533] Joshua B. Tenenbaum,et al. Beating the World's Best at Super Smash Bros. with Deep Reinforcement Learning , 2017, ArXiv.

[534] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.

[535] Ping Tan,et al. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[536] Jiwen Lu,et al. 3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-Scale 3D Point Clouds , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[537] Jiwen Lu,et al. Attention-Aware Deep Reinforcement Learning for Video Face Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[538] Le Song,et al. 2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[539] Julie A. Shah,et al. C-LEARN: Learning geometric constraints from demonstrations for multi-step manipulation in shared autonomy , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[540] Nikos Komodakis,et al. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[541] Jason Weston,et al. Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[542] Kunle Olukotun,et al. Infrastructure for Usable Machine Learning: The Stanford DAWN Project , 2017, ArXiv.

[543] Jianfeng Gao,et al. Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access , 2016, ACL.

[544] Bernhard Schölkopf,et al. Discovering Causal Signals in Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[545] Lorenzo Rosasco,et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.

[546] Tom Michael Mitchell,et al. Track how technology is transforming work , 2017, Nature.

[547] Yiwei Zhang,et al. Reinforcement Mechanism Design for Fraudulent Behaviour in e-Commerce , 2018, AAAI.

[548] Terrence J. Sejnowski,et al. Glider soaring via reinforcement learning in the field , 2018, Nature.

[549] Shuai Li,et al. TopRank: A practical algorithm for online stochastic ranking , 2018, NeurIPS.

[550] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[551] Lawrence V. Snyder,et al. Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.

[552] Nenghai Yu,et al. Model-Level Dual Learning , 2018, ICML.

[553] Joshua B. Tenenbaum,et al. End-to-End Differentiable Physics for Learning and Control , 2018, NeurIPS.

[554] Kenneth O. Stanley,et al. Safe mutations for deep and recurrent neural networks through output gradients , 2017, GECCO.

[555] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[556] Xin Wang,et al. No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling , 2018, ACL.

[557] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[558] Tommi S. Jaakkola,et al. Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[559] Martha White,et al. Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control , 2018, ICML.

[560] Jitendra Malik,et al. SFV , 2018, ACM Trans. Graph..

[561] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[562] Utkarsh Upadhyay,et al. Deep Reinforcement Learning of Marked Temporal Point Processes , 2018, NeurIPS.

[563] Yang Cai,et al. Learning Safe Policies with Expert Guidance , 2018, NeurIPS.

[564] Eric Xing,et al. Deep Generative Models with Learnable Knowledge Constraints , 2018, NeurIPS.

[565] Song Han,et al. AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[566] Razvan Pascanu,et al. Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[567] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..

[568] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.

[569] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.

[570] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.

[571] Jeffrey Dean,et al. Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[572] Tao Chen,et al. Hardware Conditioned Policies for Multi-Robot Transfer Learning , 2018, NeurIPS.

[573] Kenneth O. Stanley,et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.

[574] Clare Lyle,et al. GAN Q-learning , 2018, ArXiv.

[575] Amir-massoud Farahmand,et al. Iterative Value-Aware Model Learning , 2018, NeurIPS.

[576] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents (Extended Abstract) , 2018, IJCAI.

[577] Arvind Satyanarayan,et al. The Building Blocks of Interpretability , 2018 .

[578] Xiaoyan Zhu,et al. Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory , 2017, AAAI.

[579] Sanja Fidler,et al. NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.

[580] Ji Feng,et al. AutoEncoder by Forest , 2017, AAAI.

[581] Craig Boutilier,et al. Data center cooling using model-predictive control , 2018, NeurIPS.

[582] Christopher D. Manning,et al. Compositional Attention Networks for Machine Reasoning , 2018, ICLR.

[583] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.

[584] Samuel J Gershman,et al. The Successor Representation: Its Computational Logic and Neural Substrates , 2018, The Journal of Neuroscience.

[585] Vladlen Koltun,et al. Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[586] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[587] Shuai Wang,et al. Deep learning for sentiment analysis: A survey , 2018, WIREs Data Mining Knowl. Discov..

[588] Joel Z. Leibo,et al. Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[589] Sergey Levine,et al. Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[590] Martin Müller,et al. Memory-Augmented Monte Carlo Tree Search , 2018, AAAI.

[591] Adnan Darwiche,et al. Human-level intelligence or animal-like abilities? , 2017, Commun. ACM.

[592] Shie Mannor,et al. Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning , 2018, NeurIPS.

[593] Shane Legg,et al. Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.

[594] Karen Simonyan,et al. The challenge of realistic music generation: modelling raw audio at scale , 2018, NeurIPS.

[595] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.

[596] Gang Pan,et al. Knowledge-Guided Agent-Tactic-Aware Learning for StarCraft Micromanagement , 2018, IJCAI.

[597] Ole Winther,et al. Recurrent Relational Networks , 2017, NeurIPS.

[598] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.

[599] Jie Zhang,et al. Inference Aided Reinforcement Learning for Incentive Mechanism Design in Crowdsourcing , 2018, NeurIPS.

[600] D. Sculley,et al. Winner's Curse? On Pace, Progress, and Empirical Rigor , 2018, ICLR.

[601] Douglas Eck,et al. A Neural Representation of Sketch Drawings , 2017, ICLR.

[602] Sergey Levine,et al. Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[603] Byron Boots,et al. Dual Policy Iteration , 2018, NeurIPS.

[604] Le Song,et al. Boosting the Actor with Dual Critic , 2017, ICLR.

[605] Pascal Poupart,et al. Unsupervised Video Object Segmentation for Deep Reinforcement Learning , 2018, NeurIPS.

[606] Pieter Abbeel,et al. Learning Plannable Representations with Causal InfoGAN , 2018, NeurIPS.

[607] Yao Liu,et al. Representation Balancing MDPs for Off-Policy Policy Evaluation , 2018, NeurIPS.

[608] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.

[609] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[610] Honglak Lee,et al. Multitask Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies , 2018, NIPS 2018.

[611] Quanshi Zhang,et al. Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[612] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.

[613] Richard Evans,et al. Learning Explanatory Rules from Noisy Data , 2017, J. Artif. Intell. Res..

[614] Amir Hussain,et al. Applications of Deep Learning and Reinforcement Learning to Biological Data , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[615] Olexandr Isayev,et al. Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[616] Gerald Tesauro,et al. Learning Abstract Options , 2018, NeurIPS.

[617] Nathan Kallus,et al. Confounding-Robust Policy Improvement , 2018, NeurIPS.

[618] Joel Z. Leibo,et al. Prefrontal cortex as a meta-reinforcement learning system , 2018, bioRxiv.

[619] Xinlei Chen,et al. Iterative Visual Reasoning Beyond Convolutions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[620] Liang Zhang,et al. Deep reinforcement learning for page-wise recommendations , 2018, RecSys.

[621] Roger Wattenhofer,et al. Teaching a Machine to Read Maps with Deep Reinforcement Learning , 2017, AAAI.

[622] Craig Boutilier,et al. Non-delusional Q-learning and value-iteration , 2018, NeurIPS.

[623] Tie-Yan Liu,et al. Neural Architecture Optimization , 2018, NeurIPS.

[624] Hector Geffner,et al. Model-free, Model-based, and General Intelligence , 2018, IJCAI.

[625] Xiaohua Zhai,et al. The GAN Landscape: Losses, Architectures, Regularization, and Normalization , 2018, ArXiv.

[626] Gary Marcus,et al. Deep Learning: A Critical Appraisal , 2018, ArXiv.

[627] Razvan Pascanu,et al. Relational Deep Reinforcement Learning , 2018, ArXiv.

[628] Li Fei-Fei,et al. Progressive Neural Architecture Search , 2017, ECCV.

[629] Geoffrey E. Hinton,et al. Matrix capsules with EM routing , 2018, ICLR.

[630] Stuart J. Russell,et al. Meta-Learning MCMC Proposals , 2017, NeurIPS.

[631] Qingquan Song,et al. Efficient Neural Architecture Search with Network Morphism , 2018, ArXiv.

[632] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[633] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[634] Kai-Fu Lee. AI Superpowers: China, Silicon Valley, and the New World Order , 2018 .

[635] Joelle Pineau,et al. A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[636] Sergey Levine,et al. Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[637] Byron Boots,et al. Differentiable MPC for End-to-end Planning and Control , 2018, NeurIPS.

[638] Zheng Wang,et al. Machine Learning in Compiler Optimization , 2018, Proceedings of the IEEE.

[639] Chong Wang,et al. Subgoal Discovery for Hierarchical Dialogue Policy Learning , 2018, EMNLP.

[640] Lei Li,et al. Reinforced Co-Training , 2018, NAACL.

[641] Sergey Levine,et al. Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[642] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[643] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[644] Joelle Pineau,et al. RE-EVALUATE: Reproducibility in Evaluating Reinforcement Learning Algorithms , 2018 .

[645] John Miller,et al. When Recurrent Models Don't Need To Be Recurrent , 2018, ArXiv.

[646] Zhanxing Zhu,et al. Reinforced Continual Learning , 2018, NeurIPS.

[647] Xin Wang,et al. Video Captioning via Hierarchical Reinforcement Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[648] Baochun Li,et al. Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization , 2018, NeurIPS.

[649] Kirthevasan Kandasamy,et al. Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[650] George Papandreou,et al. Searching for Efficient Multi-Scale Architectures for Dense Image Prediction , 2018, NeurIPS.

[651] Yujing Hu,et al. Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application , 2018, KDD.

[652] Benjamin Van Roy,et al. Scalable Coordinated Exploration in Concurrent Reinforcement Learning , 2018, NeurIPS.

[653] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[654] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[655] Soumik Sarkar,et al. Online Robust Policy Learning in the Presence of Unknown Adversaries , 2018, NeurIPS.

[656] Yin Zhou,et al. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[657] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[658] Yoshua Bengio,et al. Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[659] Oriol Vinyals,et al. Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[660] H. Francis Song,et al. Machine Theory of Mind , 2018, ICML.

[661] Raia Hadsell,et al. Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[662] Rémi Munos,et al. Learning to Search with MCTSnets , 2018, ICML.

[663] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[664] Jianfeng Gao,et al. BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems , 2016, AAAI.

[665] Russ Tedrake,et al. Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation , 2018, NeurIPS.

[666] Zhuoran Yang,et al. Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization , 2018, NeurIPS.

[667] Fei Wang,et al. Deep learning for healthcare: review, opportunities and challenges , 2018, Briefings Bioinform..

[668] Martin Müller,et al. Move Prediction Using Deep Convolutional Neural Networks in Hex , 2018, IEEE Transactions on Games.

[669] Patrick M. Pilarski,et al. Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[670] Cezary Kaliszyk,et al. Reinforcement Learning of Theorem Proving , 2018, NeurIPS.

[671] Vladlen Koltun,et al. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[672] Samuel J. Gershman,et al. Human-in-the-Loop Interpretability Prior , 2018, NeurIPS.

[673] Hyrum S. Anderson,et al. Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning , 2018, ArXiv.

[674] Zachary C. Lipton,et al. The mythos of model interpretability , 2018, Commun. ACM.

[675] Richard Socher,et al. The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.

[676] Pierre Baldi,et al. Solving the Rubik's Cube Without Human Knowledge , 2018, ArXiv.

[677] Doina Precup,et al. Learning with Options that Terminate Off-Policy , 2017, AAAI.

[678] Peter Stone,et al. Autonomous agents modelling other agents: A comprehensive survey and open problems , 2017, Artif. Intell..

[679] Chuang Gan,et al. Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.

[680] Kagan Tumer,et al. Evolutionary Reinforcement Learning , 2018, NIPS 2018.

[681] Sergey Levine,et al. Sim2Real View Invariant Visual Servoing by Recurrent Control , 2017, ArXiv.

[682] Xue-Xin Wei,et al. Emergence of grid-like representations by training recurrent neural networks to perform spatial localization , 2018, ICLR.

[683] Yee Whye Teh,et al. An Analysis of Categorical Distributional Reinforcement Learning , 2018, AISTATS.

[684] Emma Brunskill,et al. Strategic Object Oriented Reinforcement Learning , 2018, ArXiv.

[685] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.

[686] Chen Liang,et al. Memory Augmented Policy Optimization for Program Synthesis with Generalization , 2018, ArXiv.

[687] Eric P. Xing,et al. Gated Path Planning Networks , 2018, ICML.

[688] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NIPS 2018.

[689] Nicholas Jing Yuan,et al. XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music , 2018, KDD.

[690] Stefano Ermon,et al. Multi-Agent Generative Adversarial Imitation Learning , 2018, NeurIPS.

[691] Carl Doersch,et al. Learning Visual Question Answering by Bootstrapping Hard Attention , 2018, ECCV.

[692] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[693] OpenAI. Learning Dexterous In-Hand Manipulation. , 2018 .

[694] William Yang Wang,et al. Deep Reinforcement Learning for NLP , 2018, ACL.

[695] Allan Jabri,et al. Universal Planning Networks , 2018, ICML.

[696] Neil Houlsby,et al. Transfer Learning with Neural AutoML , 2018, NeurIPS.

[697] Alexandre M. Bayen,et al. Expert Level Control of Ramp Metering Based on Multi-Task Deep Reinforcement Learning , 2017, IEEE Transactions on Intelligent Transportation Systems.

[698] J. Pearl,et al. The Book of Why: The New Science of Cause and Effect , 2018 .

[699] Trevor Darrell,et al. BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[700] Shi Dong,et al. An Information-Theoretic Analysis of Thompson Sampling for Large Action Spaces , 2018, NIPS 2018.

[701] William E. Byrd,et al. Neural Guided Constraint Logic Programming for Program Synthesis , 2018, NeurIPS.

[702] Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning , 2017, ICLR.

[703] Kristen Grauman,et al. Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[704] Eric P. Xing,et al. On Unifying Deep Generative Models , 2017, ICLR.

[705] Eneko Agirre,et al. Unsupervised Neural Machine Translation , 2017, ICLR.

[706] F. Viégas,et al. Deep learning of aftershock patterns following large earthquakes , 2018, Nature.

[707] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[708] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[709] Joel Z. Leibo,et al. Unsupervised Predictive Memory in a Goal-Directed Agent , 2018, ArXiv.

[710] Pieter Abbeel,et al. Evolved Policy Gradients , 2018, NeurIPS.

[711] Lihong Li,et al. Adversarial Attacks on Stochastic Bandits , 2018, NeurIPS.

[712] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[713] Thierry Moreau,et al. Learning to Optimize Tensor Programs , 2018, NeurIPS.

[714] Le Song,et al. Learning Temporal Point Processes via Reinforcement Learning , 2018, NeurIPS.

[715] Qiang Yang,et al. An Overview of Multi-task Learning , 2018 .

[716] Fangkai Yang,et al. PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making , 2018, IJCAI.

[717] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[718] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[719] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.

[720] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[721] Yu Zhang,et al. Learning to Multitask , 2018, NeurIPS.

[722] Michael I. Jordan,et al. Generalized Zero-Shot Learning with Deep Calibration Network , 2018, NeurIPS.

[723] José M. F. Moura,et al. Adversarial Multiple Source Domain Adaptation , 2018, NeurIPS.

[724] Eric P. Xing,et al. Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation , 2018, NeurIPS.

[725] Richard S. Sutton,et al. Multi-step Reinforcement Learning: A Unifying Algorithm , 2017, AAAI.

[726] Koray Kavukcuoglu,et al. Neural scene representation and rendering , 2018, Science.

[727] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.

[728] Rico Sennrich,et al. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.

[729] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[730] Liang Zhang,et al. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning , 2018, KDD.

[731] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[732] Sergey Levine,et al. One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.

[733] Luc De Raedt,et al. DeepProbLog: Neural Probabilistic Logic Programming , 2018, BNAIC/BENELEARN.

[734] Tim Kraska,et al. The Case for Learned Index Structures , 2018 .

[735] Peter W. Glynn,et al. Multi-agent Online Learning with Asynchronous Feedback Loss , 2018, NIPS 2018.

[736] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.

[737] Nicholas Jing Yuan,et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.

[738] Nan Jiang,et al. Hierarchical Imitation and Reinforcement Learning , 2018, ICML.

[739] Razvan Pascanu,et al. Relational recurrent neural networks , 2018, NeurIPS.

[740] Geoffrey J. Gordon,et al. Learning Beam Search Policies via Imitation Learning , 2018, NeurIPS.

[741] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[742] Pan He,et al. Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[743] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[744] Hamed Haddadi,et al. Deep Learning in Mobile and Wireless Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[745] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[746] Gaëtan Hadjeres,et al. Deep Learning Techniques for Music Generation , 2019 .

[747] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.

[748] Yang Yu,et al. Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning , 2018, AAAI.

[749] Rahul Sukthankar,et al. Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[750] Zachary C. Lipton,et al. Troubling Trends in Machine Learning Scholarship , 2018, ACM Queue.

[751] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[752] Alexei A. Efros,et al. Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[753] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[754] Yang Liu,et al. THUMT: An Open-Source Toolkit for Neural Machine Translation , 2017, AMTA.

[755] Julian Togelius,et al. Deep Learning for Video Game Playing , 2017, IEEE Transactions on Games.

[756] Ruocheng Guo,et al. A Survey of Learning Causality with Data , 2018, ACM Comput. Surv..

[757] Turgay Celik,et al. Toward a Smart Cloud: A Review of Fault-Tolerance Methods in Cloud Systems , 2018, IEEE Transactions on Services Computing.

[758] Ufuk Topcu,et al. Constrained Cross-Entropy Method for Safe Reinforcement Learning , 2020, IEEE Transactions on Automatic Control.

[759] De,et al. Relational Reinforcement Learning , 2022 .