论文信息 - Understanding Learning Dynamics Of Language Models with SVCCA

Understanding Learning Dynamics Of Language Models with SVCCA

Research has shown that neural models implicitly encode linguistic features, but there has been no research showing \emph{how} these encodings arise as the models are trained. We present the first study on the learning dynamics of neural language models, using a simple and flexible analysis method called Singular Vector Canonical Correlation Analysis (SVCCA), which enables us to compare learned representations across time and across models, without the need to evaluate directly on annotated data. We probe the evolution of syntactic, semantic, and topic representations and find that part-of-speech is learned earlier than topic; that recurrent layers become more similar to those of a tagger during training; and embedding layers less similar. Our results and methods could inform better learning algorithms for NLP models, possibly to incorporate linguistic information more effectively.

Adam Lopez | Naomi Saphra | Adam Lopez | Naomi Saphra

[1] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[3] Alfred O. Hero,et al. Scalable Mutual Information Estimation Using Dependence Graphs , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] Johan Bos,et al. The Groningen Meaning Bank , 2013, JSSP.

[5] Naftali Tishby,et al. Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[6] Omer Levy,et al. LSTMs Exploit Linguistic Attributes of Data , 2018, Rep4NLP@ACL.

[7] Grzegorz Chrupala,et al. Encoding of phonology in a recurrent neural model of grounded speech , 2017, CoNLL.

[8] Sara Veldhoen,et al. Visualisation and 'Diagnostic Classifiers' Reveal How Recurrent and Recursive Neural Networks Process Hierarchical Structure , 2018, J. Artif. Intell. Res..

[9] Ryan Cotterell,et al. Recurrent Neural Networks in Linguistic Theory: Revisiting Pinker and Prince (1988) and the Past Tense Debate , 2018, TACL.

[10] Samuel R. Bowman,et al. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis , 2018, BlackboxNLP@EMNLP.

[11] Johan Bos,et al. Semantic Tagging with Deep Residual Networks , 2016, COLING.

[12] Omer Levy,et al. Deep RNNs Encode Soft Hierarchical Syntax , 2018, ACL.

[13] Yulia Tsvetkov,et al. Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning , 2016, ACL.

[14] Bin Yu,et al. Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs , 2018, ICLR.

[15] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.