论文信息 - Backpack Language Models - 字舞流文

Backpack Language Models

We present Backpacks: a new neural architecture that marries strong modeling performancewith an interface for interpretability and control. Backpacks learn multiple non-contextual sense vectors for each word in a vocabulary, and represent a word in a sequence as a context-dependent, non-negative linear combination ofsense vectors in this sequence. We find that, after training, sense vectors specialize, each encoding a different aspect of a word. We can interpret a sense vector by inspecting its (non-contextual, linear) projection onto the output space, and intervene on these interpretable hooks to change the model’s behavior in predictable ways. We train a 170M-parameter Backpack language model on OpenWebText, matching the loss of a GPT-2 small (124Mparameter) Transformer. On lexical similarity evaluations, we find that Backpack sense vectors outperform even a 6B-parameter Transformer LM’s word embeddings. Finally, we present simple algorithms that intervene on sense vectors to perform controllable text generation and debiasing. For example, we can edit the sense vocabulary to tend more towards a topic, or localize a source of gender bias to a sense vector and globally suppress that sense.

Christopher D. Manning | Percy Liang | John Thickstun | John Hewitt

[1] R. Levy,et al. Probing for Incremental Parse States in Autoregressive Language Models , 2022, EMNLP.

[2] Arnab Sen Sharma,et al. Mass-Editing Memory in a Transformer , 2022, ICLR.

[3] Tom B. Brown,et al. In-context Learning and Induction Heads , 2022, ArXiv.

[4] José Camacho-Collados,et al. Twitter Topic Classification , 2022, COLING.

[5] D. Mahajan,et al. Scalable Interpretability via Polynomials , 2022, NeurIPS.

[6] Daniel Y. Fu,et al. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness , 2022, NeurIPS.

[7] D. Mahajan,et al. Neural Basis Models for Interpretability , 2022, NeurIPS.

[8] David Bau,et al. Locating and Editing Factual Associations in GPT , 2022, NeurIPS.

[9] Vikram Gupta,et al. Multilingual and Multilabel Emotion Recognition using Virtual Adversarial Training , 2021, MRL.

[10] Albert Gu,et al. Efficiently Modeling Long Sequences with Structured State Spaces , 2021, ICLR.

[11] Martin Jaggi,et al. Obtaining Better Static Word Embeddings Using Contextual Embedding Models , 2021, ACL.

[12] R. Caruana,et al. NODE-GAM: Neural Generalized Additive Model for Interpretable Deep Learning , 2021, ICLR.

[13] Nicola De Cao,et al. Editing Factual Knowledge in Language Models , 2021, EMNLP.

[14] Yejin Choi,et al. MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers , 2021, NeurIPS.

[15] Katrin Erk,et al. When is a bishop not like a rook? When it’s like a rabbi! Multi-prototype BERT embeddings for estimating semantic relationships , 2020, CONLL.

[16] Been Kim,et al. Concept Bottleneck Models , 2020, ICML.

[17] Claire Cardie,et al. Interpreting Pretrained Contextualized Representations via Reductions to Static Embeddings , 2020, ACL.

[18] Jaime Fern'andez del R'io,et al. Array programming with NumPy , 2020, Nature.

[19] Geoffrey E. Hinton,et al. Neural Additive Models: Interpretable Machine Learning with Neural Nets , 2020, NeurIPS.

[20] Agus Sudjianto,et al. GAMI-Net: An Explainable Neural Network based on Generalized Additive Models with Structured Interactions , 2020, Pattern Recognit..

[21] Samuel R. Bowman,et al. BLiMP: The Benchmark of Linguistic Minimal Pairs for English , 2019, Transactions of the Association for Computational Linguistics.

[22] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[23] J. Yosinski,et al. Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2019, ICLR.

[24] Marco Baroni,et al. The emergence of number and syntax units in LSTM language models , 2019, NAACL.

[25] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[26] Jieyu Zhao,et al. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[27] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[28] Martin Wattenberg,et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[29] Willem H. Zuidema,et al. Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure , 2017, J. Artif. Intell. Res..

[30] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[31] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.

[32] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.

[33] Felix Hill,et al. SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[34] Adam Tauman Kalai,et al. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[35] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[36] Angeliki Lazaridou,et al. The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.

[37] Hal Daumé,et al. Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[38] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[39] Felix Hill,et al. SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[40] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[41] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.

[42] Patrick Pantel,et al. From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[43] Eneko Agirre,et al. A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[44] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[45] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[46] S. Hochreiter,et al. Long Short-Term Memory , 1997, Neural Computation.

[47] H. Schütze,et al. Dimensions of meaning , 1992, Supercomputing '92.

[48] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[49] John B. Goodenough,et al. Contextual correlates of synonymy , 1965, CACM.

[50] Pierre Lison,et al. Assessing the Quality of Human-Generated Summaries with Weakly Supervised Learning , 2021, NODALIDA.

[51] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[52] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[53] Douglas L. T. Rohde,et al. An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence , 2005 .

[54] Stefan Sperlich,et al. Generalized Additive Models , 2014 .