论文信息 - Conditional probing: measuring usable information beyond a baseline - 字舞流文

Conditional probing: measuring usable information beyond a baseline

Probing experiments investigate the extent to which neural representations make properties—like part-of-speech—predictable. One suggests that a representation encodes a property if probing that representation produces higher accuracy than probing a baseline representation like non-contextual word embeddings. Instead of using baselines as a point of comparison, we’re interested in measuring information that is contained in the representation but not in the baseline. For example, current methods can detect when a representation is more useful than the word identity (a baseline) for predicting part-ofspeech; however, they cannot detect when the representation is predictive of just the aspects of part-of-speech not explainable by the word identity. In this work, we extend a theory of usable information called V-information and propose conditional probing, which explicitly conditions on the information in the baseline. In a case study, we find that after conditioning on non-contextual word embeddings, properties like part-of-speech are accessible at deeper layers of a network than previously thought.

Christopher D. Manning | Percy Liang | Kawin Ethayarajh | John Hewitt | Percy Liang | John Hewitt | Kawin Ethayarajh

[1] Language Modeling Teaches You More than Translation Does : Lessons Learned Through Auxiliary Task Analysis , 2018 .

[2] Rowan Hall Maudslay,et al. Information-Theoretic Probing for Linguistic Structure , 2020, ACL.

[3] Pareto Probing: Trading Off Accuracy for Complexity , 2020, EMNLP.

[4] Alex Wang,et al. What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[5] Allyson Ettinger,et al. Probing for semantic evidence of composition by means of simple classification tasks , 2016, RepEval@ACL.

[6] Qun Liu,et al. Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT , 2020, ACL.

[7] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[8] Andrew McCallum,et al. Fast and Accurate Entity Recognition with Iterated Dilated Convolutions , 2017, EMNLP.

[9] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[10] Luke S. Zettlemoyer,et al. Dissecting Contextual Word Embeddings: Architecture and Representation , 2018, EMNLP.

[11] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[12] Christopher D. Manning,et al. Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks , 2016, LREC.

[13] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14] Ivan Titov,et al. Information-Theoretic Probing with Minimum Description Length , 2020, EMNLP.

[15] Sampo Pyysalo,et al. Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection , 2020, LREC.

[16] Mrinmaya Sachan,et al. Bird's Eye: Probing for Linguistic Graph Structures with a Simple Information-Theoretic Approach , 2021, ACL/IJCNLP.

[17] Yoshua Bengio,et al. Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[18] Amandalynne Paullada,et al. A multilabel approach to morphosyntactic probing , 2021, EMNLP.

[19] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[20] Naomi Saphra,et al. A Non-Linear Structural Probe , 2021, NAACL.

[21] Luke S. Zettlemoyer,et al. AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[22] Yonatan Belinkov,et al. Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[23] Christopher D. Manning,et al. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages , 2020, ACL.

[24] Yonatan Belinkov,et al. Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[25] John Hewitt,et al. Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.

[26] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[27] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[28] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[29] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[30] Noga Zaslavsky,et al. Probing artificial neural networks: insights from neuroscience , 2021, ArXiv.

[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[33] Stefano Ermon,et al. A Theory of Usable Information Under Computational Constraints , 2020, ICLR.

[34] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[35] Rudolf Rosa,et al. Measuring Memorization Effect in Word-Level Neural Networks Probing , 2020, TDS.

[36] Liping Jing,et al. Probing BERT in Hyperbolic Spaces , 2021, ICLR.

[37] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[38] Yonatan Belinkov,et al. Probing Classifiers: Promises, Shortcomings, and Alternatives , 2021, ArXiv.

[39] Adam Lopez,et al. Understanding Learning Dynamics Of Language Models with SVCCA , 2018, NAACL.

[40] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[41] Xing Shi,et al. Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[42] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[43] John Hale,et al. LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better , 2018, ACL.

[44] Alexander M. Rush,et al. Low-Complexity Probing via Finding Subnetworks , 2021, NAACL.

[45] David Marecek,et al. Introducing Orthogonal Constraint in Structural Probes , 2021, ACL/IJCNLP.

[46] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .