Automated Concatenation of Embeddings for Structured Prediction

Pretrained contextualized embeddings are powerful word representations for structured prediction tasks. Recent work found that better word representations can be obtained by concatenating different types of embeddings. However, the selection of embeddings to form the best concatenated representation usually varies depending on the task and the collection of candidate embeddings, and the ever-increasing number of embedding types makes it a more difficult problem. In this paper, we propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks, based on a formulation inspired by recent progress on neural architecture search. Specifically, a controller alternately samples a concatenation of embeddings, according to its current belief of the effectiveness of individual embedding types in consideration for a task, and updates the belief based on a reward. We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model, which is fed with the sampled concatenation as input and trained on a task dataset. Empirical results on 6 tasks and 21 datasets show that our approach outperforms strong baselines and achieves state-of-the-art performance with fine-tuned embeddings in all the evaluations.

[1]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[2]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[3]  Quoc V. Le,et al.  Semi-Supervised Sequence Modeling with Cross-View Training , 2018, EMNLP.

[4]  Frank Hutter,et al.  Simple And Efficient Architecture Search for Convolutional Neural Networks , 2017, ICLR.

[5]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[6]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[7]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[8]  Quoc V. Le,et al.  The Evolved Transformer , 2019, ICML.

[9]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[10]  Carlos G'omez-Rodr'iguez,et al.  Transition-based Semantic Dependency Parsing with Pointer Networks , 2020, ACL.

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[14]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yijia Liu,et al.  Parsing Tweets into Universal Dependencies , 2018, NAACL.

[17]  Mark Dredze,et al.  Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.

[18]  Suresh Manandhar,et al.  SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[19]  Daniel Kondratyuk,et al.  75 Languages, 1 Model: Parsing Universal Dependencies Universally , 2019, EMNLP.

[20]  Ameet Talwalkar,et al.  Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[21]  Steven J. DeRose,et al.  Grammatical Category Disambiguation by Statistical Optimization , 1988, CL.

[22]  Kewei Tu,et al.  Semi-Supervised Semantic Dependency Parsing Using CRF Autoencoders , 2020, ACL.

[23]  Haris Papageorgiou,et al.  SemEval-2016 Task 5: Aspect Based Sentiment Analysis , 2016, *SEMEVAL.

[24]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[25]  Zhongqiang Huang,et al.  More Embeddings, Better Sequence Labelers? , 2020, FINDINGS.

[26]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[27]  B. M. Sundheim,et al.  Named entity task definition, version 2.1 , 1995 .

[28]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[29]  Suresh Manandhar,et al.  SemEval-2015 Task 12: Aspect Based Sentiment Analysis , 2015, *SEMEVAL.

[30]  Alan L. Yuille,et al.  Genetic CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[32]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[33]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Sampo Pyysalo,et al.  Exploring Cross-sentence Contexts for Named Entity Recognition with BERT , 2020, COLING.

[35]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[36]  Xuanjing Huang,et al.  Part-of-Speech Tagging for Twitter with Adversarial Neural Networks , 2017, EMNLP.

[37]  Kewei Tu,et al.  Second-Order Neural Dependency Parsing with Message Passing and End-to-End Training , 2020, AACL.

[38]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[39]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[40]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[41]  Junru Zhou,et al.  Head-Driven Phrase Structure Grammar Parsing on Penn Treebank , 2019, ACL.

[42]  Dat Quoc Nguyen,et al.  BERTweet: A pre-trained language model for English Tweets , 2020, EMNLP.

[43]  Stephan Oepen,et al.  Broad-Coverage Semantic Dependency Parsing , 2014 .

[44]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[45]  Jan Hajic,et al.  Neural Architectures for Nested NER through Linearization , 2019, ACL.

[46]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[47]  Kewei Tu,et al.  Structure-Level Knowledge Distillation For Multilingual Sequence Labeling , 2020, ACL.

[48]  Juntao Yu,et al.  Named Entity Recognition as Dependency Parsing , 2020, ACL.

[49]  Masanori Suganuma,et al.  A genetic programming approach to designing convolutional neural network architectures , 2017, GECCO.

[50]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[51]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[52]  Wei Wu,et al.  Practical Block-Wise Neural Network Architecture Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[54]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[55]  Khalil Mrini,et al.  Rethinking Self-Attention: Towards Interpretability in Neural Parsing , 2019, FINDINGS.

[56]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[57]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[58]  Philip S. Yu,et al.  Double Embeddings and CNN-based Sequence Labeling for Aspect Extraction , 2018, ACL.

[59]  Xipeng Qiu,et al.  AutoTrans: Automating Transformer Design via Reinforced Architecture Search , 2020, NLPCC.

[60]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[61]  Luke S. Zettlemoyer,et al.  Cloze-driven Pretraining of Self-attention Networks , 2019, EMNLP.

[62]  Lidong Bing,et al.  Exploiting BERT for End-to-End Aspect-based Sentiment Analysis , 2019, EMNLP.

[63]  Kewei Tu,et al.  Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor , 2020, ACL.

[64]  Philip S. Yu,et al.  BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis , 2019, NAACL.

[65]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[66]  Kewei Tu,et al.  Second-Order Semantic Dependency Parsing with End-to-End Neural Networks , 2019, ACL.

[67]  Xinyue Liu,et al.  SeqVAT: Virtual Adversarial Training for Semi-Supervised Sequence Labeling , 2020, ACL.

[68]  Quoc V. Le,et al.  NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Martin Wistuba,et al.  Deep Learning Architecture Search by Neuro-Cell-Based Evolution with Function-Preserving Mutations , 2018, ECML/PKDD.

[70]  Carlos Gómez-Rodríguez,et al.  Left-to-Right Dependency Parsing with Pointer Networks , 2019, NAACL.

[71]  Timothy Dozat,et al.  Simpler but More Accurate Semantic Dependency Parsing , 2018, ACL.

[72]  Han He,et al.  Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT , 2019, FLAIRS.

[73]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[74]  Min Zhang,et al.  Efficient Second-Order TreeCRF for Neural Dependency Parsing , 2020, ACL.

[75]  Dario Floreano,et al.  Neuroevolution: from architectures to learning , 2008, Evol. Intell..

[76]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[77]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[78]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[79]  Xuanjing Huang,et al.  Transferring from Formal Newswire Domain with Hypernet for Twitter POS Tagging , 2018, EMNLP.

[80]  Stephan Oepen,et al.  SemEval 2014 Task 8: Broad-Coverage Semantic Dependency Parsing , 2014, *SEMEVAL.

[81]  Noah A. Smith,et al.  Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser , 2016, EMNLP.

[82]  Yusuke Miyao,et al.  SemEval 2015 Task 18: Broad-Coverage Semantic Dependency Parsing , 2015, *SEMEVAL.

[83]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[84]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[85]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[86]  Kewei Tu,et al.  Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning , 2021, ACL/IJCNLP.

[87]  Peter M. Todd,et al.  Designing Neural Networks using Genetic Algorithms , 1989, ICGA.

[88]  Roland Vollgraf,et al.  Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[89]  Hiroyuki Shindo,et al.  LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention , 2020, EMNLP.

[90]  Zhao Hai,et al.  Global Greedy Dependency Parsing , 2019, AAAI.

[91]  Jingzhou Liu,et al.  Stack-Pointer Networks for Dependency Parsing , 2018, ACL.

[92]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[93]  Regina Barzilay,et al.  Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing , 2019, NAACL.

[94]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[95]  Li Fei-Fei,et al.  Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[96]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[97]  Fandong Meng,et al.  GCDT: A Global Context Enhanced Deep Transition Architecture for Sequence Labeling , 2019, ACL.

[98]  Yu Hong,et al.  Don’t Eclipse Your Arts Due to Small Discrepancies: Boundary Repositioning with a Pointer Network for Aspect Extraction , 2020, ACL.

[99]  Peter J. Angeline,et al.  An evolutionary algorithm that constructs recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[100]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[101]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.