Understanding the World Through Action

The recent history of machine learning research has taught us that machine learning methods can be most effective when they are provided with very large, high-capacity models, and trained on very large and diverse datasets. This has spurred the community to search for ways to remove any bottlenecks to scale. Often the foremost among such bottlenecks is the need for human effort, including the effort of curating and labeling datasets. As a result, considerable attention in recent years has been devoted to utilizing unlabeled data, which can be collected in vast quantities. However, some of the most widely used methods for training on such unlabeled data themselves require human-designed objective functions that must correlate in some meaningful way to downstream tasks. I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning, using general purpose unsupervised or selfsupervised reinforcement learning objectives in concert with offline reinforcement learning methods that can leverage large datasets. I will discuss how such a procedure is more closely aligned with potential downstream tasks, and how it could build on existing techniques that have been developed in recent years.

[1]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[2]  Yoshua Bengio,et al.  Towards Causal Representation Learning , 2021, ArXiv.

[3]  Sergey Levine,et al.  Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills , 2021, ICML.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[6]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[7]  Pierre-Yves Oudeyer,et al.  The Playground Experiment: Task-Independent Development of a Curious Robot , 2005 .

[8]  Sergey Levine,et al.  C-Learning: Learning to Achieve Goals via Recursive Classification , 2020, ICLR.

[9]  Sergey Levine,et al.  Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[10]  S. Levine,et al.  γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction , 2020, ArXiv.

[11]  S. Levine,et al.  Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.

[12]  David Warde-Farley,et al.  Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[13]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[14]  Sergey Levine,et al.  Outcome-Driven Reinforcement Learning via Variational Inference , 2021, NeurIPS.

[15]  Hubert L. Dreyfus,et al.  From Socrates to Expert Systems: The Limits of Calculative Rationality , 1986 .

[16]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[17]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[18]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[19]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[20]  Christoph Salge,et al.  Empowerment - an Introduction , 2013, ArXiv.

[21]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[22]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[23]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[25]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[26]  Sergey Levine,et al.  RECON: Rapid Exploration for Open-World Navigation with Latent Goal Models , 2021, CoRL.

[27]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[28]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[29]  N. Whitman A bitter lesson. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[30]  Chelsea Finn,et al.  Model-Based Visual Planning with Self-Supervised Functional Distances , 2020, ICLR.

[31]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[32]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[33]  Kenneth O. Stanley,et al.  Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[34]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.