论文信息 - Low-Complexity Probing via Finding Subnetworks

Low-Complexity Probing via Finding Subnetworks

The dominant approach in probing neural networks for linguistic properties is to train a new shallow multi-layer perceptron (MLP) on top of the model’s internal representations. This approach can detect properties encoded in the model, but at the cost of adding new parameters that may learn the task directly. We instead propose a subtractive pruning-based probe, where we find an existing subnetwork that performs the linguistic task of interest. Compared to an MLP, the subnetwork probe achieves both higher accuracy on pre-trained models and lower accuracy on random models, so it is both better at finding properties of interest and worse at learning on its own. Next, by varying the complexity of each probe, we show that subnetwork probing Pareto-dominates MLP probing in that it achieves higher accuracy given any budget of probe complexity. Finally, we analyze the resulting subnetworks across various tasks to locate where each task is encoded, and we find that lower-level tasks are captured in lower layers, reproducing similar findings in past work.

Alexander M. Rush | Victor Sanh | Steven Cao

[1] Ivan Titov,et al. Information-Theoretic Probing with Minimum Description Length , 2020, EMNLP.

[2] Martin Jaggi,et al. Masking as an Efficient Alternative to Finetuning for Pretrained Language Models , 2020, EMNLP.

[3] Samuel R. Bowman,et al. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis , 2018, BlackboxNLP@EMNLP.

[4] Yonatan Belinkov,et al. Investigating Gender Bias in Language Models Using Causal Mediation Analysis , 2020, NeurIPS.

[5] Alan W Black,et al. Measuring Bias in Contextualized Word Representations , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[6] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[7] Ryan Cotterell,et al. Pareto Probing: Trading-Off Accuracy and Complexity , 2020, EMNLP.

[8] Max Welling,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[9] Alexander M. Rush,et al. Movement Pruning: Adaptive Sparsity by Fine-Tuning , 2020, NeurIPS.

[10] Xin Wang,et al. How fine can fine-tuning be? Learning efficient language models , 2020, AISTATS.

[11] Yonatan Belinkov,et al. Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[12] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[13] Timothy Dozat,et al. Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[14] Omer Levy,et al. Emergent linguistic structure in artificial neural networks trained by self-supervision , 2020, Proceedings of the National Academy of Sciences.

[15] Rowan Hall Maudslay,et al. Information-Theoretic Probing for Linguistic Structure , 2020, ACL.

[16] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[17] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.