论文信息 - Deep Learning and Reward Design for Reinforcement Learning - 字舞流文

Deep Learning and Reward Design for Reinforcement Learning

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

[1] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[3] Michael L. Littman,et al. Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.

[4] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.

[5] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[6] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Tanaka Fumihide,et al. Multitask Reinforcement Learning on the Distribution of MDPs , 2003 .

[8] Jonathan Schaeffer,et al. A World Championship Caliber Checkers Program , 1992, Artif. Intell..

[9] Christopher D. Rosin,et al. Nested Rollout Policy Adaptation for Monte Carlo Tree Search , 2011, IJCAI.

[10] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[11] Sriraam Natarajan,et al. Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[12] Peter Stone,et al. Transferring Instances for Model-Based Reinforcement Learning , 2008, ECML/PKDD.

[13] Daphne Koller,et al. Making Rational Decisions Using Adaptive Utility Elicitation , 2000, AAAI/IAAI.

[14] Marc G. Bellemare,et al. Investigating Contingency Awareness Using Atari 2600 Games , 2012, AAAI.

[15] Regina Barzilay,et al. Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[16] Brian Sheppard,et al. World-championship-caliber Scrabble , 2002, Artif. Intell..

[17] Milica Gasic,et al. POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[18] Hang Li,et al. Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[19] Nan Jiang,et al. Improving UCT planning via approximate homomorphisms , 2014, AAMAS.

[20] Christopher G. Atkeson,et al. A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.

[21] Peter Stone,et al. Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping , 2006, AAAI.

[22] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[25] Jason Weston,et al. Towards Understanding Situated Natural Language , 2010, AISTATS.

[26] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[28] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[29] Marc G. Bellemare,et al. Bayesian Learning of Recursively Factored Environments , 2013, ICML.

[30] David Silver,et al. Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[31] Jason Weston,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[32] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] David Vandyke,et al. Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems , 2015, SIGDIAL Conference.

[34] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[35] Levente Kocsis,et al. Transpositions and move groups in Monte Carlo tree search , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[36] Joelle Pineau,et al. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[37] Quoc V. Le,et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[38] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Jason Weston,et al. Dialog-based Language Learning , 2016, NIPS.

[40] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[41] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[42] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[43] Jürgen Schmidhuber,et al. Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.

[44] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[45] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[46] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[47] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[48] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[49] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[50] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[51] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[52] Jason Weston,et al. Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[53] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[54] Quoc V. Le,et al. A Neural Conversational Model , 2015, ArXiv.

[55] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[56] David Vandyke,et al. Learning from real users: rating dialogue success with neural networks for reinforcement learning in spoken dialogue systems , 2015, INTERSPEECH.

[57] P. Bartlett,et al. Stochastic optimization of controlled partially observable Markov decision processes , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[58] Richard L. Lewis,et al. Reward Design via Online Gradient Ascent , 2010, NIPS.

[59] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[60] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[61] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[62] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[63] Jude W. Shavlik,et al. Policy Transfer via Markov Logic Networks , 2009, ILP.

[64] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[65] Matthew Henderson,et al. Word-Based Dialog State Tracking with Recurrent Neural Networks , 2014, SIGDIAL Conference.

[66] Andrew G. Barto,et al. Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[67] Erik Talvitie,et al. Improving Exploration in UCT Using Local Manifolds , 2015, AAAI.

[68] Richard L. Lewis,et al. Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents , 2011, AAAI.

[69] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.

[70] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.

[71] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[72] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[73] Marc G. Bellemare,et al. Skip Context Tree Switching , 2014, ICML.

[74] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[75] Rudolf Kadlec,et al. Improved Deep Learning Baselines for Ubuntu Corpus Dialogs , 2015, ArXiv.

[76] Claude E. Shannon,et al. Programming a computer for playing chess , 1950 .

[77] Richard L. Lewis,et al. Reward Mapping for Transfer in Long-Lived Agents , 2013, NIPS.

[78] Shimon Whiteson,et al. Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.

[79] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[80] Xiang Zhang,et al. Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems , 2015, ICLR.

[81] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[82] Marc G. Bellemare,et al. Sketch-Based Linear Value Function Approximation , 2012, NIPS.

[83] Joelle Pineau,et al. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[84] Pascal Vincent,et al. Visualizing Higher-Layer Features of a Deep Network , 2009 .

[85] Doina Precup,et al. Using Options for Knowledge Transfer in Reinforcement Learning , 1999 .

[86] Risto Miikkulainen,et al. HyperNEAT-GGP: a hyperNEAT-based atari general game player , 2012, GECCO '12.

[87] Geoffrey Zweig,et al. End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning , 2016, ArXiv.

[88] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[89] Andrea Bonarini,et al. Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[90] Thomas G. Dietterich,et al. State Aggregation in Monte Carlo Tree Search , 2014, AAAI.

[91] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[92] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[93] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[94] Michael Buro,et al. Improving heuristic mini-max search by supervised learning , 2002, Artif. Intell..

[95] Honglak Lee,et al. Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games , 2016, IJCAI.

[96] Joel Veness,et al. Bootstrapping from Game Tree Search , 2009, NIPS.