论文信息 - Understanding the World Through Action

Understanding the World Through Action

The recent history of machine learning research has taught us that machine learning methods can be most effective when they are provided with very large, high-capacity models, and trained on very large and diverse datasets. This has spurred the community to search for ways to remove any bottlenecks to scale. Often the foremost among such bottlenecks is the need for human effort, including the effort of curating and labeling datasets. As a result, considerable attention in recent years has been devoted to utilizing unlabeled data, which can be collected in vast quantities. However, some of the most widely used methods for training on such unlabeled data themselves require human-designed objective functions that must correlate in some meaningful way to downstream tasks. I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning, using general purpose unsupervised or selfsupervised reinforcement learning objectives in concert with offline reinforcement learning methods that can leverage large datasets. I will discuss how such a procedure is more closely aligned with potential downstream tasks, and how it could build on existing techniques that have been developed in recent years.

Sergey Levine | S. Levine

[1] Sergey Levine,et al. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[2] Yoshua Bengio,et al. Towards Causal Representation Learning , 2021, ArXiv.

[3] Sergey Levine,et al. Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills , 2021, ICML.

[4] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[6] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.

[7] Pierre-Yves Oudeyer,et al. The Playground Experiment: Task-Independent Development of a Curious Robot , 2005 .

[8] Sergey Levine,et al. C-Learning: Learning to Achieve Goals via Recursive Classification , 2020, ICLR.

[9] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[10] S. Levine,et al. γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction , 2020, ArXiv.

[11] S. Levine,et al. Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.