Larger-Context Tagging: When and Why Does It Work?

The development of neural networks and pretraining techniques has spawned many sentence-level tagging systems that achieved superior performance on typical benchmarks. However, a relatively less discussed topic is what if more context information is introduced into current top-scoring tagging systems. Although several existing works have attempted to shift tagging systems from sentence-level to document-level, there is still no consensus conclusion about when and why it works, which limits the applicability of the larger-context approach in tagging tasks. In this paper, instead of pursuing a state-of-the-art tagging system by architectural exploration, we focus on investigating when and why the larger-context training, as a general strategy, can work. To this end, we conduct a thorough comparative study on four proposed aggregators for context information collecting and present an attribute-aided evaluation method to interpret the improvement brought by larger-context training. Experimentally, we set up a testbed based on four tagging tasks and thirteen datasets. Hopefully, our preliminary observations can deepen the understanding of larger-context training and enlighten more follow-up works on the use of contextual information.

[1]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[2]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[3]  Xiao Huang,et al.  TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition , 2020, ACL.

[4]  Xuanjing Huang,et al.  Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study , 2020, AAAI.

[5]  Graham Neubig,et al.  Generalizing Natural Language Analysis through Span-relation Representations , 2020, ACL.

[6]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[7]  Si Li,et al.  Towards Accurate Word Segmentation for Chinese Patents , 2016, ArXiv.

[8]  Joelle Pineau,et al.  Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program) , 2020, J. Mach. Learn. Res..

[9]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[10]  Kai Xu,et al.  Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition , 2019, Comput. Biol. Medicine.

[11]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[12]  Jinlan Fu,et al.  Is Chinese Word Segmentation a Solved Task? Rethinking Neural Chinese Word Segmentation , 2020, EMNLP.

[13]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[14]  Hannaneh Hajishirzi,et al.  Entity, Relation, and Event Extraction with Contextualized Span Representations , 2019, EMNLP.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[17]  Zhicheng Dou,et al.  Leveraging Multi-Token Entities in Document-Level Named Entity Recognition , 2020, AAAI.

[18]  Heng Ji,et al.  Reliability-aware Dynamic Feature Composition for Name Tagging , 2019, ACL.

[19]  Jinlan Fu,et al.  Interpretable Multi-dataset Evaluation for Named Entity Recognition , 2020, EMNLP.

[20]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[21]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[22]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[23]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[24]  Zhicheng Dou,et al.  Document-Level Named Entity Recognition by Incorporating Global and Neighbor Features , 2019, CCIR.

[25]  Hui Chen,et al.  GRN: Gated Relation Network to Enhance Convolutional Neural Network for Named Entity Recognition , 2019, AAAI.

[26]  Kentaro Inui,et al.  Instance-Based Learning of Span Representations: A Case Study through Named Entity Recognition , 2020, ACL.

[27]  Jiwei Li,et al.  A Unified MRC Framework for Named Entity Recognition , 2019, ACL.

[28]  Jinlan Fu,et al.  Towards More Fine-grained and Reliable NLP Performance Prediction , 2021, EACL.

[29]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[30]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[31]  Xipeng Qiu,et al.  TENER: Adapting Transformer Encoder for Named Entity Recognition , 2019, ArXiv.

[32]  Hai Zhao,et al.  Hierarchical Contextualized Representation for Named Entity Recognition , 2019, AAAI.

[33]  Dan Klein,et al.  A Joint Model for Entity Analysis: Coreference, Typing, and Linking , 2014, TACL.

[34]  M. Mukaka,et al.  Statistics corner: A guide to appropriate use of correlation coefficient in medical research. , 2012, Malawi medical journal : the journal of Medical Association of Malawi.

[35]  Si Li,et al.  Effective Document-Level Features for Chinese Patent Word Segmentation , 2014, ACL.

[36]  Andrew McCallum,et al.  Fast and Accurate Entity Recognition with Iterated Dilated Convolutions , 2017, EMNLP.

[37]  Hongfei Lin,et al.  An attention‐based BiLSTM‐CRF approach to document‐level chemical named entity recognition , 2018, Bioinform..

[38]  Roland Vollgraf,et al.  Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[39]  Robert Tibshirani,et al.  Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .

[40]  Regina Barzilay,et al.  GraphIE: A Graph-Based Framework for Information Extraction , 2018, NAACL.

[41]  Jianfei Yu,et al.  Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer , 2020, ACL.

[42]  Ani Nenkova,et al.  Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve , 2020, ArXiv.

[43]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[44]  Yue Zhang,et al.  Design Challenges and Misconceptions in Neural Sequence Labeling , 2018, COLING.

[45]  Christopher D. Manning,et al.  An Effective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition , 2006, ACL.

[46]  Philippe Langlais,et al.  Robust Lexical Features for Improved Neural Network Named-Entity Recognition , 2018, COLING.