Neuron to Graph: Interpreting Language Model Neurons at Scale
暂无分享,去创建一个
[1] Fazl Barez,et al. System III: Learning with Domain Knowledge for Safety Constraints , 2023, arXiv.org.
[2] Tom B. Brown,et al. In-context Learning and Induction Heads , 2022, ArXiv.
[3] Dario Amodei,et al. Toy Models of Superposition , 2022, ArXiv.
[4] Nicholas Carlini,et al. Unsolved Problems in ML Safety , 2021, ArXiv.
[5] Li Dong,et al. Knowledge Neurons in Pretrained Transformers , 2021, ACL.
[6] Martin Wattenberg,et al. An Interpretability Illusion for BERT , 2021, ArXiv.
[7] Alec Radford,et al. Multimodal Neurons in Artificial Neural Networks , 2021 .
[8] Ludwig Schubert,et al. High/Low frequency detectors , 2021 .
[9] Charles Foster,et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.
[10] Omer Levy,et al. Transformer Feed-Forward Layers Are Key-Value Memories , 2020, EMNLP.
[11] Yonatan Belinkov,et al. Analyzing Individual Neurons in Pre-trained Language Models , 2020, EMNLP.
[12] Ryan Cotterell,et al. Intrinsic Probing through Dimension Selection , 2020, EMNLP.
[13] Jacob Andreas,et al. Compositional Explanations of Neurons , 2020, NeurIPS.
[14] Nick Cammarata,et al. An Overview of Early Vision in InceptionV1 , 2020 .
[15] Nick Cammarata,et al. Zoom In: An Introduction to Circuits , 2020 .
[16] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[17] Yonatan Belinkov,et al. What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models , 2018, AAAI.
[18] Yonatan Belinkov,et al. Identifying and Controlling Important Neurons in Neural Machine Translation , 2018, ICLR.
[19] Hinrich Schütze,et al. Interpretable Textual Neuron Representations for NLP , 2018, BlackboxNLP@EMNLP.
[20] Ilya Sutskever,et al. Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.
[21] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[22] Deborah Silver,et al. Feature Visualization , 1994, Scientific Visualization.
[23] Hassan Sajjad,et al. Implicit representations of event properties within contextual language models: Searching for “causativity neurons” , 2021, IWCS.
[24] Yonatan Belinkov,et al. Investigating Gender Bias in Language Models Using Causal Mediation Analysis , 2020, NeurIPS.
[25] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.