Mimic and Conquer: Heterogeneous Tree Structure Distillation for Syntactic NLP

Syntax has been shown useful for various NLP tasks, while existing work mostly encodes singleton syntactic tree using one hierarchical neural network. In this paper, we investigate a simple and effective method, Knowledge Distillation, to integrate heterogeneous structure knowledge into a unified sequential LSTM encoder. Experimental results on four typical syntax-dependent tasks show that our method outperforms tree encoders by effectively integrating rich heterogeneous structure syntax, meanwhile reducing error propagation, and also outperforms ensemble methods, in terms of both the efficiency and accuracy.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[3]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[4]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[5]  Xiao Chen,et al.  Combine Constituent and Dependency Parsing via Reranking , 2013, IJCAI.

[6]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[7]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[8]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[9]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[10]  Lidia S. Chao,et al.  Leveraging Local and Global Patterns for Self-Attention Networks , 2019, ACL.

[11]  Shijie Chen,et al.  Technical report on Conversational Question Answering , 2019, ArXiv.

[12]  Yue Zhang,et al.  Head-Lexicalized Bidirectional Tree LSTMs , 2017, TACL.

[13]  Yuji Matsumoto,et al.  A* CCG Parsing with a Supertag and Dependency Factored Model , 2017, ACL.

[14]  Anoop Sarkar,et al.  Top-down Tree Structured Decoding with Syntactic Connections for Neural Machine Translation and Parsing , 2018, EMNLP.

[15]  Hwee Tou Ng,et al.  Towards Robust Linguistic Analysis using OntoNotes , 2013, CoNLL.

[16]  Bo Chen,et al.  Tree-Structured Neural Machine for Linguistics-Aware Sentence Generation , 2018, AAAI.

[17]  Liang Lu,et al.  Top-down Tree Long Short-Term Memory Networks , 2015, NAACL.

[18]  Xuanjing Huang,et al.  Sentence Modeling with Gated Recursive Neural Network , 2015, EMNLP.

[19]  James Henderson,et al.  Incremental Recurrent Neural Network Dependency Parser with Search-based Discriminative Training , 2015, CoNLL.

[20]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[21]  Quoc V. Le,et al.  BAM! Born-Again Multi-Task Networks for Natural Language Understanding , 2019, ACL.

[22]  Dan Klein,et al.  Constituency Parsing with a Self-Attentive Encoder , 2018, ACL.

[23]  Yu Cheng,et al.  Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Christopher D. Manning,et al.  Graph Convolution over Pruned Dependency Trees Improves Relation Extraction , 2018, EMNLP.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Donghong Ji,et al.  Cross-Lingual Semantic Role Labeling With Model Transfer , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  Xuanjing Huang,et al.  Meta Multi-Task Learning for Sequence Modeling , 2018, AAAI.

[30]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[31]  Diego Marcheggiani,et al.  Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling , 2017, EMNLP.

[32]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[33]  Yang Liu,et al.  Structured Alignment Networks for Matching Sentences , 2018, EMNLP.

[34]  David Vilares,et al.  Sequence Labeling Parsing by Learning across Representations , 2019, ACL.

[35]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[36]  Shigeki Matsubara,et al.  PTB Graph Parsing with Tree Approximation , 2019, ACL.

[37]  Yue Zhang,et al.  Sentence-State LSTM for Text Representation , 2018, ACL.

[38]  Christof Monz,et al.  Bilingual Structured Language Models for Statistical Machine Translation , 2015, EMNLP.

[39]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[40]  Yafeng Ren,et al.  A tree-based neural network model for biomedical event trigger detection , 2020, Inf. Sci..

[41]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[42]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[43]  Peter Norvig,et al.  Deep Learning with Dynamic Computation Graphs , 2017, ICLR.

[44]  Junru Zhou,et al.  Head-Driven Phrase Structure Grammar Parsing on Penn Treebank , 2019, ACL.

[45]  Kiyoaki Shirai,et al.  PhraseRNN: Phrase Recursive Neural Network for Aspect-based Sentiment Analysis , 2015, EMNLP.

[46]  Andrew McCallum,et al.  Linguistically-Informed Self-Attention for Semantic Role Labeling , 2018, EMNLP.

[47]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[48]  Helmut Schmid,et al.  Features for Phrase-Structure Reranking from Dependency Parses , 2011, IWPT.

[49]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[50]  Donghong Ji,et al.  Dispatched attention with multi-task learning for nested mention recognition , 2020, Inf. Sci..

[51]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[52]  Marine Carpuat,et al.  Weakly Supervised Cross-lingual Semantic Relation Classification via Knowledge Distillation , 2019, EMNLP.

[53]  Stephen Clark,et al.  Scalable Syntax-Aware Language Models Using Knowledge Distillation , 2019, ACL.

[54]  Yue Zhang,et al.  Tree Communication Models for Sentiment Analysis , 2019, ACL.

[55]  John Cocke,et al.  Programming languages and their compilers: Preliminary notes , 1969 .

[56]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[57]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..