A data-driven approach for learning to control computers

It would be useful for machines to use computers as humans do so that they can aid us in everyday tasks. This is a setting in which there is also the potential to leverage large-scale expert demonstrations and human judgements of interactive behaviour, which are two ingredients that have driven much recent success in AI. Here we investigate the setting of computer control using keyboard and mouse, with goals specified via natural language. Instead of focusing on hand-designed curricula and specialized action spaces, we focus on developing a scalable method centered on reinforcement learning combined with behavioural priors informed by actual human-computer interactions. We achieve state-of-the-art and human-level mean performance across all tasks within the MiniWob++ benchmark, a challenging suite of computer control problems, and find strong evidence of cross-task transfer. These results demonstrate the usefulness of a unified human-agent interface when training machines to use computers. Altogether our results suggest a formula for achieving competency beyond MiniWob++ and towards controlling computers, in general, as a human would.

[1]  Natasha Jaques,et al.  Environment Generation for Zero-Shot Compositional Reinforcement Learning , 2022, NeurIPS.

[2]  Jeff Wu,et al.  WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.

[3]  Po-Sen Huang,et al.  Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.

[4]  Tamara von Glehn,et al.  Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning , 2021, ArXiv.

[5]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[6]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[7]  Sheila A. McIlraith,et al.  AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning , 2021, Canadian Conference on AI.

[8]  Doina Precup,et al.  AndroidEnv: A Reinforcement Learning Platform for Android , 2021, ArXiv.

[9]  Jason Weston,et al.  How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds , 2020, NAACL.

[10]  Ryan J. Lowe,et al.  Learning to summarize from human feedback , 2020, NeurIPS 2020.

[11]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[12]  Xin Zhou,et al.  Mapping Natural Language Instructions to Mobile UI Action Sequences , 2020, ACL.

[13]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[14]  H. Francis Song,et al.  V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2019, ICLR.

[15]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[16]  M. Shoeybi,et al.  Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[17]  Natasha Jaques,et al.  Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.

[18]  Jimmy Ba,et al.  DOM-Q-NET: Grounded RL on Structured Language , 2019, ICLR.

[19]  Dilek Z. Hakkani-Tür,et al.  Learning to Navigate the Web , 2018, ICLR.

[20]  Shane Legg,et al.  Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.

[21]  Percy Liang,et al.  Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration , 2018, ICLR.

[22]  Percy Liang,et al.  World of Bits: An Open-Domain Platform for Web-Based Agents , 2017, ICML.

[23]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Jason Weston,et al.  Dialogue Learning With Human-In-The-Loop , 2016, ICLR.

[26]  Johannes Fürnkranz,et al.  A Survey of Preference-Based Reinforcement Learning Methods , 2017, J. Mach. Learn. Res..

[27]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .