Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.

Po-Sen Huang | Erich Elsen | Arthur Mensch | Demis Hassabis | Chris Dyer | Simon Osindero | Oriol Vinyals | Karen Simonyan | Koray Kavukcuoglu | David Budden | Albin Cassirer | Sumanth Dathathri | Jeff Stanway | Angeliki Lazaridou | Jonathan Uesato | Michela Paganini | John Aslanides | Laurent Sifre | Jacob Menick | Elena Gribovskaya | Antonia Creswell | Blake A. Hechtman | Laura Rimell | Johannes Welbl | Doug Fritz | Lisa Anne Hendricks | Maria Tsimpoukelli | Siddhant M. Jayakumar | Aida Nematzadeh | Aidan Clark | Diego de Las Casas | Irina Higgins | Jordan Hoffmann | Blake Hechtman | Jean-Baptiste Lespiau | Vladimir Mikulik | Jack W. Rae | Laura Weidinger | Adhiguna Kuncoro | Sebastian Borgeaud | Iason Gabriel | Daniel Toyama | William S. Isaac | Cyprien de Masson d'Autume | Aurelia Guy | Igor Babuschkin | Roman Ring | George van den Driessche | Domenic Donato | Susannah Young | Elena Buchatskaya | Kareem Ayoub | John F. J. Mellor | Xiang Lorraine Li | John Mellor | Maribeth Rauh | William Isaac | Geoffrey Irving | Trevor Cai | Eliza Rutherford | Katie Millican | Saffron Huang | Chris Jones | Tayfun Terzi | Francis Song | Amelia Glaese | Yujia Li | Nat McAleese | Sarah Henderson | Tom Hennigan | Richard Powell | Amy Wu | Siddhant Jayakumar | Esme Sutherland | Lena Martens | Nikolai Grigorev | Thibault Sottiaux | Mantas Pajarskas | Toby Pohlen | Zhitao Gong | James Bradbury | Matthew Johnson | Ed Lockhart | Lorrayne Bennett | Geoffrey Irving | Oriol Vinyals | I. Babuschkin | Richard Powell | L. Sifre | Trevor Cai | Tobias Pohlen | D. Budden | Roman Ring | K. Kavukcuoglu | D. Hassabis | Simon Osindero | Chris Dyer | Yujia Li | Angeliki Lazaridou | K. Simonyan | I. Higgins | James Bradbury | Jacob Menick | Edward Lockhart | J. Lespiau | Erich Elsen | Po-Sen Huang | A. Kuncoro | J. Aslanides | Albin Cassirer | E. Gribovskaya | Aidan Clark | Jordan Hoffmann | Laura Rimell | Aida Nematzadeh | Sebastian Borgeaud | Michela Paganini | Antonia Creswell | Sarah Henderson | Aurelia Guy | Vladimir Mikulik | Daniel Toyama | Sumanth Dathathri | A. Mensch | Kareem W. Ayoub | Doug Fritz | Katie Millican | Eliza Rutherford | Tayfun Terzi | Susannah Young | Xiang Lorraine Li | Laura Weidinger | Maribeth Rauh | Iason Gabriel | Elena Buchatskaya | Johannes Welbl | Domenic Donato | J. Uesato | M. Rauh | A. Glaese | Nathan McAleese | Z. Gong | Tom Hennigan | Francis Song | Maria Tsimpoukelli | Saffron Huang | Chris Jones | Amy Wu | Esme Sutherland | Lena Martens | N. Grigorev | Thibault Sottiaux | Mantas Pajarskas | Matthew G. Johnson | Jeff Stanway | L. Bennett | G. Irving | Igor Babuschkin | O. Vinyals | Jean-Baptiste Lespiau | John Aslanides