Deep Learning and Reward Design for Reinforcement Learning
暂无分享,去创建一个
[1] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[2] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[3] Michael L. Littman,et al. Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.
[4] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[5] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[6] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[7] Tanaka Fumihide,et al. Multitask Reinforcement Learning on the Distribution of MDPs , 2003 .
[8] Jonathan Schaeffer,et al. A World Championship Caliber Checkers Program , 1992, Artif. Intell..
[9] Christopher D. Rosin,et al. Nested Rollout Policy Adaptation for Monte Carlo Tree Search , 2011, IJCAI.
[10] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[11] Sriraam Natarajan,et al. Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.
[12] Peter Stone,et al. Transferring Instances for Model-Based Reinforcement Learning , 2008, ECML/PKDD.
[13] Daphne Koller,et al. Making Rational Decisions Using Adaptive Utility Elicitation , 2000, AAAI/IAAI.
[14] Marc G. Bellemare,et al. Investigating Contingency Awareness Using Atari 2600 Games , 2012, AAAI.
[15] Regina Barzilay,et al. Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.
[16] Brian Sheppard,et al. World-championship-caliber Scrabble , 2002, Artif. Intell..
[17] Milica Gasic,et al. POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.
[18] Hang Li,et al. Neural Responding Machine for Short-Text Conversation , 2015, ACL.
[19] Nan Jiang,et al. Improving UCT planning via approximate homomorphisms , 2014, AAMAS.
[20] Christopher G. Atkeson,et al. A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.
[21] Peter Stone,et al. Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping , 2006, AAAI.
[22] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.
[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[24] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[25] Jason Weston,et al. Towards Understanding Situated Natural Language , 2010, AISTATS.
[26] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[27] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[28] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[29] Marc G. Bellemare,et al. Bayesian Learning of Recursively Factored Environments , 2013, ICML.
[30] David Silver,et al. Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..
[31] Jason Weston,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.
[32] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] David Vandyke,et al. Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems , 2015, SIGDIAL Conference.
[34] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[35] Levente Kocsis,et al. Transpositions and move groups in Monte Carlo tree search , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.
[36] Joelle Pineau,et al. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.
[37] Quoc V. Le,et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.
[38] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Jason Weston,et al. Dialog-based Language Learning , 2016, NIPS.
[40] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[41] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[42] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[43] Jürgen Schmidhuber,et al. Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.
[44] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[45] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[46] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[47] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[48] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..
[49] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[50] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[51] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.
[52] Jason Weston,et al. Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.
[53] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[54] Quoc V. Le,et al. A Neural Conversational Model , 2015, ArXiv.
[55] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[56] David Vandyke,et al. Learning from real users: rating dialogue success with neural networks for reinforcement learning in spoken dialogue systems , 2015, INTERSPEECH.
[57] P. Bartlett,et al. Stochastic optimization of controlled partially observable Markov decision processes , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).
[58] Richard L. Lewis,et al. Reward Design via Online Gradient Ascent , 2010, NIPS.
[59] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[60] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[61] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[62] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[63] Jude W. Shavlik,et al. Policy Transfer via Markov Logic Networks , 2009, ILP.
[64] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[65] Matthew Henderson,et al. Word-Based Dialog State Tracking with Recurrent Neural Networks , 2014, SIGDIAL Conference.
[66] Andrew G. Barto,et al. Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.
[67] Erik Talvitie,et al. Improving Exploration in UCT Using Local Manifolds , 2015, AAAI.
[68] Richard L. Lewis,et al. Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents , 2011, AAAI.
[69] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.
[70] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[71] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[72] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.
[73] Marc G. Bellemare,et al. Skip Context Tree Switching , 2014, ICML.
[74] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.
[75] Rudolf Kadlec,et al. Improved Deep Learning Baselines for Ubuntu Corpus Dialogs , 2015, ArXiv.
[76] Claude E. Shannon,et al. Programming a computer for playing chess , 1950 .
[77] Richard L. Lewis,et al. Reward Mapping for Transfer in Long-Lived Agents , 2013, NIPS.
[78] Shimon Whiteson,et al. Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.
[79] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[80] Xiang Zhang,et al. Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems , 2015, ICLR.
[81] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[82] Marc G. Bellemare,et al. Sketch-Based Linear Value Function Approximation , 2012, NIPS.
[83] Joelle Pineau,et al. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.
[84] Pascal Vincent,et al. Visualizing Higher-Layer Features of a Deep Network , 2009 .
[85] Doina Precup,et al. Using Options for Knowledge Transfer in Reinforcement Learning , 1999 .
[86] Risto Miikkulainen,et al. HyperNEAT-GGP: a hyperNEAT-based atari general game player , 2012, GECCO '12.
[87] Geoffrey Zweig,et al. End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning , 2016, ArXiv.
[88] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[89] Andrea Bonarini,et al. Transfer of samples in batch reinforcement learning , 2008, ICML '08.
[90] Thomas G. Dietterich,et al. State Aggregation in Monte Carlo Tree Search , 2014, AAAI.
[91] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[92] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[93] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[94] Michael Buro,et al. Improving heuristic mini-max search by supervised learning , 2002, Artif. Intell..
[95] Honglak Lee,et al. Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games , 2016, IJCAI.
[96] Joel Veness,et al. Bootstrapping from Game Tree Search , 2009, NIPS.