Efficient EUD Parsing

We present the system submission from the FASTPARSE team for the EUD Shared Task at IWPT 2020. We engaged with the task by focusing on efficiency. For this we considered training costs and inference efficiency. Our models are a combination of distilled neural dependency parsers and a rule-based system that projects UD trees into EUD graphs. We obtained an average ELAS of 74.04 for our official submission, ranking 4th overall.

[1]  Quoc V. Le,et al.  BAM! Born-Again Multi-Task Networks for Natural Language Understanding , 2019, ACL.

[2]  David Vilares,et al.  Better, Faster, Stronger Sequence Tagging Constituent Parsers , 2019, NAACL.

[3]  Oren Etzioni,et al.  Green AI , 2019, Commun. ACM.

[4]  Masafumi Hagiwara,et al.  A simple and effective method for removal of hidden units and weights , 1994, Neurocomputing.

[5]  Tao Zhang,et al.  Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges , 2018, IEEE Signal Processing Magazine.

[6]  Yijia Liu,et al.  Distilling Knowledge for Search-based Structured Prediction , 2018, ACL.

[7]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[8]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[9]  André F. T. Martins,et al.  Marian: Fast Neural Machine Translation in C++ , 2018, ACL.

[10]  Christopher D. Manning,et al.  Compression of Neural Machine Translation Models via Pruning , 2016, CoNLL.

[11]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[12]  Noah A. Smith,et al.  Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser , 2016, EMNLP.

[13]  David Vilares,et al.  Viable Dependency Parsing as Sequence Labeling , 2019, NAACL.

[14]  Carlos G'omez-Rodr'iguez,et al.  Distilling Neural Networks for Greener and Faster Dependency Parsing , 2020, IWPT.

[15]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[16]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[17]  Michelle Guo,et al.  Knowledge distillation for small-footprint highway networks , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Gosse Bouma,et al.  Overview of the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies , 2020, IWPT.

[19]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[20]  Haejun Lee,et al.  On-Device Neural Language Model Based Word Prediction , 2018, COLING.

[21]  Milan Straka,et al.  Universal Dependencies 2.5 Models for UDPipe (2019-12-06) , 2019 .

[22]  Shingo Mabu,et al.  Enhancing the generalization ability of neural networks through controlling the hidden layers , 2009, Appl. Soft Comput..

[23]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.