论文信息 - Automated Concatenation of Embeddings for Structured Prediction

Automated Concatenation of Embeddings for Structured Prediction

Pretrained contextualized embeddings are powerful word representations for structured prediction tasks. Recent work found that better word representations can be obtained by concatenating different types of embeddings. However, the selection of embeddings to form the best concatenated representation usually varies depending on the task and the collection of candidate embeddings, and the ever-increasing number of embedding types makes it a more difficult problem. In this paper, we propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks, based on a formulation inspired by recent progress on neural architecture search. Specifically, a controller alternately samples a concatenation of embeddings, according to its current belief of the effectiveness of individual embedding types in consideration for a task, and updates the belief based on a reward. We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model, which is fed with the sampled concatenation as input and trained on a task dataset. Empirical results on 6 tasks and 21 datasets show that our approach outperforms strong baselines and achieves state-of-the-art performance with fine-tuned embeddings in all the evaluations.

[1] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[2] Brendan T. O'Connor,et al. Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[3] Quoc V. Le,et al. Semi-Supervised Sequence Modeling with Cross-View Training , 2018, EMNLP.

[4] Frank Hutter,et al. Simple And Efficient Architecture Search for Convolutional Neural Networks , 2017, ICLR.

[5] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[6] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[7] Oren Etzioni,et al. Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[8] Quoc V. Le,et al. The Evolved Transformer , 2019, ICML.

[9] Sabine Buchholz,et al. CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[10] Carlos G'omez-Rodr'iguez,et al. Transition-based Semantic Dependency Parsing with Pointer Networks , 2020, ACL.

[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[12] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[14] Lucien Tesnière. Éléments de syntaxe structurale , 1959 .

[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Yijia Liu,et al. Parsing Tweets into Universal Dependencies , 2018, NAACL.

[17] Mark Dredze,et al. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.

[18] Suresh Manandhar,et al. SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[19] Daniel Kondratyuk,et al. 75 Languages, 1 Model: Parsing Universal Dependencies Universally , 2019, EMNLP.

[20] Ameet Talwalkar,et al. Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[21] Steven J. DeRose,et al. Grammatical Category Disambiguation by Statistical Optimization , 1988, CL.

[22] Kewei Tu,et al. Semi-Supervised Semantic Dependency Parsing Using CRF Autoencoders , 2020, ACL.

[23] Haris Papageorgiou,et al. SemEval-2016 Task 5: Aspect Based Sentiment Analysis , 2016, *SEMEVAL.

[24] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[25] Zhongqiang Huang,et al. More Embeddings, Better Sequence Labelers? , 2020, FINDINGS.

[26] Roland Vollgraf,et al. Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[27] B. M. Sundheim,et al. Named entity task definition, version 2.1 , 1995 .

[28] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[29] Suresh Manandhar,et al. SemEval-2015 Task 12: Aspect Based Sentiment Analysis , 2015, *SEMEVAL.

[30] Alan L. Yuille,et al. Genetic CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[32] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[33] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Sampo Pyysalo,et al. Exploring Cross-sentence Contexts for Named Entity Recognition with BERT , 2020, COLING.

[35] Wojciech Zaremba,et al. An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[36] Xuanjing Huang,et al. Part-of-Speech Tagging for Twitter with Adversarial Neural Networks , 2017, EMNLP.

[37] Kewei Tu,et al. Second-Order Neural Dependency Parsing with Message Passing and End-to-End Training , 2020, AACL.

[38] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[39] Ramesh Raskar,et al. Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[40] Myle Ott,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[41] Junru Zhou,et al. Head-Driven Phrase Structure Grammar Parsing on Penn Treebank , 2019, ACL.

[42] Dat Quoc Nguyen,et al. BERTweet: A pre-trained language model for English Tweets , 2020, EMNLP.

[43] Stephan Oepen,et al. Broad-Coverage Semantic Dependency Parsing , 2014 .

[44] Brendan T. O'Connor,et al. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[45] Jan Hajic,et al. Neural Architectures for Nested NER through Linearization , 2019, ACL.