A data-driven approach for learning to control computers
暂无分享,去创建一个
Petko Georgiev | Tobias Pohlen | T. Lillicrap | David Raposo | Adam Santoro | Josh Abramson | Alistair Muldal | P. Humphreys | Alex Goldin | Gregory Thornton | Rachita Chhaparia
[1] Natasha Jaques,et al. Environment Generation for Zero-Shot Compositional Reinforcement Learning , 2022, NeurIPS.
[2] Jeff Wu,et al. WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.
[3] Po-Sen Huang,et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.
[4] Tamara von Glehn,et al. Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning , 2021, ArXiv.
[5] Michael S. Bernstein,et al. On the Opportunities and Risks of Foundation Models , 2021, ArXiv.
[6] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.
[7] Sheila A. McIlraith,et al. AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning , 2021, Canadian Conference on AI.
[8] Doina Precup,et al. AndroidEnv: A Reinforcement Learning Platform for Android , 2021, ArXiv.
[9] Jason Weston,et al. How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds , 2020, NAACL.
[10] Ryan J. Lowe,et al. Learning to summarize from human feedback , 2020, NeurIPS 2020.
[11] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[12] Xin Zhou,et al. Mapping Natural Language Instructions to Mobile UI Action Sequences , 2020, ACL.
[13] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[14] H. Francis Song,et al. V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2019, ICLR.
[15] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[16] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[17] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[18] Jimmy Ba,et al. DOM-Q-NET: Grounded RL on Structured Language , 2019, ICLR.
[19] Dilek Z. Hakkani-Tür,et al. Learning to Navigate the Web , 2018, ICLR.
[20] Shane Legg,et al. Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.
[21] Percy Liang,et al. Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration , 2018, ICLR.
[22] Percy Liang,et al. World of Bits: An Open-Domain Platform for Web-Based Agents , 2017, ICML.
[23] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[24] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[25] Jason Weston,et al. Dialogue Learning With Human-In-The-Loop , 2016, ICLR.
[26] Johannes Fürnkranz,et al. A Survey of Preference-Based Reinforcement Learning Methods , 2017, J. Mach. Learn. Res..
[27] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[30] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .