BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

Alexander M. Rush | Dragomir R. Radev | Ona de Gibert | Stephen H. Bach | David Ifeoluwa Adelani | Alham Fikri Aji | Hyung Won Chung | Genta Indra Winata | Mike Tian-Jian Jiang | Daniel H Garrette | Tiago Timponi Torrent | M Saiful Bari | Zheng Xin Yong | Chris C. Emezue | Teven Le Scao | Jason Alan Fries | Patrick von Platen | Leandro von Werra | Nihal V. Nayak | Oskar van der Wal | Shamsuddeen Hassan Muhammad | Stella Rose Biderman | Javier de la Rosa | Carlos Muñoz Ferrandis | Maged S. Al-shaibani | Abhinav Ramesh Kashyap | Julio Bonis Sanz | Eduardo G. Ponferrada | Sabrina J. Mielke | Pawan Sasanka Ammanamanchi | Pedro Ortiz Suarez | Albert Villanova del Moral | Samyam Rajbhandari | Jeff Rasley | Olatunji Ruwase | M. Shoeybi | J. Casper | Iz Beltagy | Kyle Lo | D. Narayanan | Colin Raffel | Jesse Dodge | Yacine Jernite | Ofir Press | Angela Fan | Margaret Mitchell | Danish Contractor | Minjia Zhang | Aitor Soroa Etxabe | Max Ryabinin | Irene Solaiman | Adam Roberts | Sebastian Gehrmann | Urmish Thakker | Benoît Sagot | Gully A. Burns | Ehud Reiter | Thomas Wolf | Germán Kruszewski | Veronika Laippala | Sampo Pyysalo | Marine Carpuat | Benjamin Heinzerling | D. Tunuguntla | Antonio Miranda-Escalada | A. Callahan | Dian Yu | Hendrik Strobelt | M. Samwald | Pascale Fung | Jungo Kasai | Itziar Gonzalez-Dios | Michael McKenna | Sheng Shen | Jonathan Chang | Nazneen Rajani | Conglong Li | Isaac Johnson | Thibault Févry | Nora Kassner | Anna Rogers | Chenglei Si | Elizabeth Salesky | Verena Rieser | Jekaterina Novikova | Franccois Yvon | Rachel Bawden | Tristan Thrush | Julien Launay | Christopher Klamm | Aaron Gokaslan | Simon Ott | Tatiana Shavrina | B. Ajibade | Matteo Manica | Najoung Kim | Taewoon Kim | Douwe Kiela | Niklas Muennighoff | Nafis Abrar | J. Forde | Zhiqing Sun | Vikas Raunak | Anne-Laure Ligozat | Jian Zhu | S. Longpre | Newton Cheng | Azadeh HajiHosseini | Antoine Chaffin | Thomas Scialom | Sourav Roy | Shaden Smith | Vassilina Nikoulina | S. Viguier | Gunjan Chhablani | N. Muellner | A. Feizpour | Myriam Peyrounette | V. Danchev | Maximin Coavoux | Mayank Singh | Debajyoti Datta | J. Golde | R. L'opez | Luisa Shinzato | Alice Rueda | J. Bhattacharjee | Edward Tan | Olivier Nguyen | Matthias Gallé | Zifan Ye | N. Dahlberg | Arjun Subramonian | R. Lacroix | Clémentine Fourrier | I. Nejadgholi | Lu Liu | Yanis Labrak | Minna Liu | Albert Webson | D. Lansky | John Giorgi | Canwen Xu | Samuel Albanie | Wojciech Kusa | Harshit Pandey | Daniel Hesslow | S. Alizadeh | Victor Sanh | Zaid Alyafeai | Arnaud Stiegler | Arun Raja | Manan Dey | Shanya Sharma | Eliza Szczechla | Han Wang | Thomas Wang | Trishala Neeraj | Jos Rozen | Abheesht Sharma | Andrea Santilli | Ryan Teehan | Leo Gao | T. Bers | Rui Zhang | Leon Weber | R. Ribeiro | Jason Phang | Jordan Clive | Peter Henderson | Nishant Subramani | A. Luccioni | R. Kromann | Pierre Colombo | Srishti Kumar | L. Tanguy | Samuel Cahyawijaya | Jenny Chim | Ken Kawamura | Mustafa Ghaleb | V. Mikhailov | Myungsun Kang | Idris Abdulmumin | Hady ElSahar | Colin Leong | Hieu Tran | Fatim T Mirza | Indrani Bhattacharya | Stefan Schweter | Jorg Frohberg | Tim Dettmers | Ahmed Baruwa | Joshua Seltzer | Elizabeth-Jane Pavlick | Huu Nguyen | Maraim Masoud | Samson Tan | Gérard Dupont | Zeerak Talat | Somaieh Nikpoor | Rishi Bommasani | Christopher Akiki | Karthi Sivaraman | Yada Pruksachatkun | A. Tammour | Yonatan Belinkov | F. Toni | Enrique Manjavacas | Daniel Alexander van Strien | Natasha Seelam | Gabriel Altay | Ruisi Su | Samuele Garda | Bo Wang | Fabio Barth | Mario Sanger | Daniel Le'on Perin'an | Th'eo Gigant | J. Posada | Marc Pàmies | Marianna Nezhurina | Robert Martin | Michael Cullan | Shamik Bose | Shlok S Deshmukh | Sid Kiblawi | Benjamin Beilharz | Hugo Laurenccon | Ethan Kim | Timo Schick | Paulo Villegas | Jaesung Tae | Quentin Lhoest | Lucile Saulnier | Davis David | Salomey Osei | Nurulaqilla Khamis | Chenxi Zhou | Habib Rezanejad | J. Tow | Charles Lovering | Jan-Christoph Kalo | S. Zink | Amit Alfassy | Michael Weinberg | Long Phan | Angelina McMillan-Major | Mayank Mishra | T. A. Laud | Wilson Y. Lee | M. Muñoz | Tomasz Limisiewicz | Eli Bogdanov | Sanchit Gandhi | Ying Xu | Ekaterina Taktasheva | Oleg Serikov | V. Protasov | E. Voloshina | Adi Simhi | Hailey Schoelkopf | Omer Antverg | Lintang Sutawika | Y. Venkatraman | M. Freidank | Y. Uri | B. Saxena | Silas L. Wang | S. Pais | Suzana Ili'c | Roman Castagn'e | Stas Bekman | Ariel Kreisberg Nitzav | Chenghao Mou | Efrat Levkovizh | E. Natan | Giada Pistilli | Hamza Benyamina | Ian Yu | Josephine L. Tobing | Khalid Almubarak | Kimbo Chen | Mar'ia Grandury | Mario vSavsko | Max Huang | Minh Chien Vu | M. A. Jauhar | Omar Espejel | Priscilla Amuok | Rheza Harliman | Sebastian Nagel | Stanislav Silberberg | S. Pai | Violette Lepercq | V. Prabhu | Srulik Ben-David | Xiang Tang | Shaked Brody | Hadar Tojarieh | Hatim Bourfoune | N. Patry | Nouamane Tazi | Omar Sanseviero | Pierre Cornette | Pierre Franccois Lavall'ee | S. Requena | Suraj Patil | Anastasia Cheveleva | Aur'elie N'ev'eol | Liam Hazan | Miruna Clinciu | Tian Yun | Zachary Bamberger | Zdenvek Kasner | Amanda Pestana | Ammar Khan | Amy Faranak | A. Santos | A. Hevia | Antigona Unldreaj | Arash Aghagol | Arezoo Abdollahi | Bahareh Behroozi | D. A. Nguyen | Emily Baylor | Ezinwanne Ozoani | Frankline Ononiwu | H.A. Jones | Irina Sedenko | J. Passmore | L. Dutra | Mairon Samagaio | Maraim Elbadri | Marissa Gerchick | Martha Akinlolu | Mike Qiu | M. Ghauri | Mykola Burynok | Nour Elkott | N. Fahmy | O. Samuel | Ran An | Ryan Hao | Sarmad Shubber | Thanh-Cong Le | Tobi Oyebade | T. Le | Yoyo Yang | Z. Nguyen | Alfredo Palasciano | Anima Shukla | A. Singh | C. Brito | Chirag Jain | Chuxin Xu | Daniel Molano | Florian Fuhrimann | Giyaseddin Bayrak | Helena U. Vrabec | I. Bello | Isha Dash | J. Kang | Lokesh Bulchandani | Madeleine Hahn de Bykhovetz | Maiko Takeuchi | M. A. Castillo | M. Wolf | Mina Mihaljcic | N. Broad | Patricia Haller | R. Chandrasekhar | R. Eisenberg | Rodrigo L. Canalli | Rosaline Su | Shubhanshu Mishra | Sinee Sang-aroonsiri | S. Bharati | Tomoya Kainuma | Yashasvi Bajaj | Yifan Xu | Z. Tan | Zhongli Xie | M. Bras | Younes Belkada | Loubna Ben Allal | A. Singh | Ruochen Zhang | Karen Fort | M. Mieskes | Yun-chao Xu | Rui Ribeiro | Amanpreet Singh

[1]  Khaled Kamal Saab,et al.  Hungry Hungry Hippos: Towards Language Modeling with State Space Models , 2022, 2212.14052.

[2]  Yacine Jernite,et al.  BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model , 2022, ArXiv.

[3]  Dragomir R. Radev,et al.  Crosslingual Generalization through Multitask Finetuning , 2022, ArXiv.

[4]  Anne-Laure Ligozat,et al.  Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model , 2022, J. Mach. Learn. Res..

[5]  Zheng Xin Yong,et al.  What Language Model to Train if You Have One Million GPU Hours? , 2022, EMNLP.

[6]  Tatiana Shavrina,et al.  Universal and Independent: Multilingual Probing Framework for Exhaustive Model Interpretation and Evaluation , 2022, BLACKBOXNLP.

[7]  Quoc V. Le,et al.  Transcending Scaling Laws with 0.1% Extra Compute , 2022, EMNLP.

[8]  Nils Reimers,et al.  MTEB: Massive Text Embedding Benchmark , 2022, ArXiv.

[9]  Stella Rose Biderman,et al.  EleutherAI: Going Beyond "Open Science" to "Science in the Open" , 2022, ArXiv.

[10]  P. Zhang,et al.  GLM-130B: An Open Bilingual Pre-trained Model , 2022, ICLR.

[11]  M. Lewis,et al.  LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale , 2022, ArXiv.

[12]  Jack G. M. FitzGerald,et al.  AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model , 2022, ArXiv.

[13]  Khalid N. Elmadani,et al.  Masader Plus: A New Interface for Exploring +500 Arabic NLP Datasets , 2022, ArXiv.

[14]  Inioluwa Deborah Raji,et al.  The Fallacy of AI Functionality , 2022, FAccT.

[15]  J. Dean,et al.  Emergent Abilities of Large Language Models , 2022, Trans. Mach. Learn. Res..

[16]  Gerard de Melo,et al.  Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.

[17]  Dragomir R. Radev,et al.  Data Governance in the Age of Large-Scale Data-Driven Language Technology , 2022, FAccT.

[18]  Xi Victoria Lin,et al.  OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.

[19]  Harish Tayyar Madabushi,et al.  SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding , 2022, SEMEVAL.

[20]  Jack G. M. FitzGerald,et al.  MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages , 2022, ACL.

[21]  Tatiana Shavrina,et al.  mGPT: Few-Shot Learners Go Multilingual , 2022, ArXiv.

[22]  Stella Rose Biderman,et al.  GPT-NeoX-20B: An Open-Source Autoregressive Language Model , 2022, BIGSCIENCE.

[23]  InCoder: A Generative Model for Code Infilling and Synthesis , 2022, 2204.05999.

[24]  Hyung Won Chung,et al.  What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? , 2022, ICML.

[25]  Javier de la Rosa,et al.  Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0 , 2022, BIGSCIENCE.

[26]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[27]  Rebecca Lynn Johnson,et al.  The Ghost in the Machine has an American accent: value conflict in GPT-3 , 2022, ArXiv.

[28]  Niklas Muennighoff SGPT: GPT Sentence Embeddings for Semantic Search , 2022, ArXiv.

[29]  Sebastian Gehrmann,et al.  Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text , 2022, J. Artif. Intell. Res..

[30]  Cherepanov,et al.  Competition-level code generation with AlphaCode , 2022, Science.

[31]  Alexander M. Rush,et al.  PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts , 2022, ACL.

[32]  Hady Elsahar,et al.  Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources , 2022, ArXiv.

[33]  Stella Biderman,et al.  Datasheet for the Pile , 2022, ArXiv.

[34]  Albert Gu,et al.  Efficiently Modeling Long Sequences with Structured State Spaces , 2021, ICLR.

[35]  Alexander M. Rush,et al.  Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.

[36]  Mustafa Ghaleb,et al.  Masader: Metadata Sourcing for Arabic Text and Speech Data Resources , 2021, LREC.

[37]  Quoc V. Le,et al.  Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.

[38]  Noah A. Smith,et al.  Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation , 2021, ICLR.

[39]  William Agnew,et al.  The Values Encoded in Machine Learning Research , 2021, FAccT.

[40]  Marc'Aurelio Ranzato,et al.  The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation , 2021, TACL.

[41]  Ankur Bapna,et al.  Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets , 2021, TACL.

[42]  Yonatan Belinkov,et al.  Probing Classifiers: Promises, Shortcomings, and Advances , 2021, CL.

[43]  Brent J. Hecht,et al.  Behavioral Use Licensing for Responsible AI , 2020, FAccT.

[44]  Noah A. Smith,et al.  Benchmarking Generalization via In-Context Instructions on 1, 600+ Language Tasks , 2022, ArXiv.

[45]  Shachar Mirkin,et al.  Emergent Structures and Training Dynamics in Large Language Models , 2022, BIGSCIENCE.

[46]  Karën Fort,et al.  French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English , 2022, ACL.

[47]  Dragomir R. Radev,et al.  You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings , 2022, BIGSCIENCE.

[48]  M Saiful Bari Dataset Debt in Biomedical Language Modeling , 2022 .

[49]  Junyuan Shang,et al.  ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation , 2021, ArXiv.

[50]  Elizabeth Salesky,et al.  Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP , 2021, ArXiv.

[51]  Xi Victoria Lin,et al.  Few-shot Learning with Multilingual Generative Language Models , 2021, EMNLP.

[52]  Po-Sen Huang,et al.  Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.

[53]  Amandalynne Paullada,et al.  AI and the Everything in the Whole Wide World Benchmark , 2021, NeurIPS Datasets and Benchmarks.

[54]  Vinay Uday Prabhu,et al.  Multimodal datasets: misogyny, pornography, and malignant stereotypes , 2021, ArXiv.

[55]  Kyungduk Kim,et al.  What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers , 2021, EMNLP.

[56]  Alexander M. Rush,et al.  Datasets: A Community Library for Natural Language Processing , 2021, EMNLP.

[57]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[58]  Laurent Romary,et al.  Ungoliant: An Optimized Pipeline for the Generation of a Very Large-Scale Multilingual Web Corpus , 2021 .

[59]  J. Donnelly,et al.  External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients. , 2021, JAMA internal medicine.

[60]  Praveen K. Paritosh,et al.  “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI , 2021, CHI.

[61]  David R. So,et al.  Carbon Emissions and Large Neural Network Training , 2021, ArXiv.

[62]  Jianlin Su,et al.  RoFormer: Enhanced Transformer with Rotary Position Embedding , 2021, Neurocomputing.

[63]  Jesse Dodge,et al.  Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus , 2021, EMNLP.

[64]  Stella Biderman,et al.  GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow , 2021 .

[65]  Emily M. Bender,et al.  On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.

[66]  Hyung Won Chung,et al.  Do Transformer Modifications Transfer Across Implementations and Applications? , 2021, EMNLP.

[67]  Noam M. Shazeer,et al.  Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, J. Mach. Learn. Res..

[68]  Charles Foster,et al.  The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[69]  Iryna Gurevych,et al.  How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models , 2020, ACL.

[70]  Colin Raffel,et al.  mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.

[71]  Holger Schwenk,et al.  Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..

[72]  Benoît Crabbé,et al.  Un modèle Transformer Génératif Pré-entrainé pour le _ _ _ _ _ _ français (Generative Pre-trained Transformer in _ _ _ _ _ _ (French) We introduce a French adaptation from the well-known GPT model) , 2021, JEPTALNRECITAL.

[73]  Aurélie Névéol,et al.  Evaluating the carbon footprint of NLP methods: a survey and analysis of existing tools , 2021, SUSTAINLP.

[74]  Hanna M. Wallach,et al.  Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets , 2021, ACL.

[75]  Claire Cardie,et al.  WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization , 2020, FINDINGS.

[76]  Samuel R. Bowman,et al.  CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models , 2020, EMNLP.

[77]  Olatunji Ruwase,et al.  DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters , 2020, KDD.

[78]  C. Ré,et al.  HiPPO: Recurrent Memory with Optimal Polynomial Projections , 2020, NeurIPS.

[79]  P. Howard,et al.  What to expect when you’re expecting robots: Futures, expectations, and pseudo-artificial general intelligence in UK news , 2020, Journalism.

[80]  Deniz Yuret,et al.  KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media , 2020, SEMEVAL.

[81]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[82]  Mitesh M. Khapra,et al.  AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages , 2020, ArXiv.

[83]  Noam Shazeer,et al.  GLU Variants Improve Transformer , 2020, ArXiv.

[84]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[85]  Laurent Romary,et al.  CamemBERT: a Tasty French Language Model , 2019, ACL.

[86]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[87]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[88]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[89]  Samyam Rajbhandari,et al.  ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2019, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[90]  Sophie Rosset,et al.  DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation , 2019, Language Resources and Evaluation.

[91]  Daniel S. Weld,et al.  S2ORC: The Semantic Scholar Open Research Corpus , 2020, ACL.

[92]  Alexandre Lacoste,et al.  Quantifying the Carbon Emissions of Machine Learning , 2019, ArXiv.

[93]  M. Shoeybi,et al.  Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[94]  John Hewitt,et al.  Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.

[95]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[96]  Benoît Sagot,et al.  Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures , 2019 .

[97]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[98]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[99]  Pradeep Dubey,et al.  A Study of BFLOAT16 for Deep Learning Training , 2019, ArXiv.

[100]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[101]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[102]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[103]  Francis M. Tyers,et al.  Universal Dependencies , 2017, EACL.

[104]  Yonatan Belinkov,et al.  Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[105]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[106]  Noah Constant,et al.  Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.

[107]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[108]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[109]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[110]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[111]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[112]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[113]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[114]  Willem H. Zuidema,et al.  Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure , 2017, J. Artif. Intell. Res..

[115]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[116]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[117]  J Brennen,et al.  An industry-led debate: how UK media cover artificial intelligence , 2018 .

[118]  Yang Yang,et al.  Deep Learning Scaling is Predictable, Empirically , 2017, ArXiv.

[119]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[120]  L. Winner DO ARTIFACTS HAVE (cid:1) POLITICS? , 2022 .

[121]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[122]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[123]  Yonatan Belinkov,et al.  Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[124]  Allyson Ettinger,et al.  Probing for semantic evidence of composition by means of simple classification tasks , 2016, RepEval@ACL.

[125]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[126]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.

[127]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[128]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[129]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[130]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[131]  Sudhakar Yalamanchili,et al.  Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[132]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[133]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[134]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[135]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[136]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[137]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[138]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[139]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[140]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[141]  Shari Collins-Chobanian Faces of Environmental Racism: Confronting Issues of Global Justice , 1999 .

[142]  Walter Klöpffer,et al.  Life cycle assessment , 1997, Environmental science and pollution research international.

[143]  Jürgen Schmidhuber,et al.  Sequential neural text compression , 1996, IEEE Trans. Neural Networks.

[144]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[145]  Philip Gage,et al.  A new algorithm for data compression , 1994 .

[146]  Risto Miikkulainen,et al.  Natural Language Processing With Modular PDP Networks and Distributed Lexicon , 1991, Cogn. Sci..

[147]  L. Winner Autonomous Technology: Technics-out-of-Control as a Theme in Political Thought , 1977 .

[148]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .