Training with Streaming Annotation

In this paper, we address a practical scenario where training data is released in a sequence of small-scale batches and annotation in earlier phases has lower quality than the later counterparts. To tackle the situation, we utilize a pre-trained transformer network to preserve and integrate the most salient document information from the earlier batches while focusing on the annotation (presumably with higher quality) from the current batch. Using event extraction as a case study, we demonstrate in the experiments that our proposed framework can perform better than conventional approaches (the improvement ranges from 3.6 to 14.9% absolute F-score gain), especially when there is more noise in the early annotation; and our approach spares 19.1% time with regard to the best conventional method.

[1]  Heng Ji,et al.  Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning , 2016, HLT-NAACL.

[2]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[3]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Xiaocheng Feng,et al.  A language-independent neural network for event detection , 2016, Science China Information Sciences.

[7]  Guodong Zhou,et al.  Self-regulation: Employing a Generative Adversarial Network to Improve Event Detection , 2018, ACL.

[8]  Hsin-Hsi Chen,et al.  Event Clustering on Streaming News Using Co-Reference Chains and Event Words , 2004 .

[9]  Heng Ji,et al.  Fine-grained Coordinated Cross-lingual Text Stream Alignment for Endless Language Knowledge Acquisition , 2018, EMNLP.

[10]  Jun Zhao,et al.  Collective Event Detection via a Hierarchical and Bias Tagging Networks with Gated Multi-level Attention Mechanisms , 2018, EMNLP.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Prashanth Vijayaraghavan,et al.  Learning Personas from Dialogue with Attentive Memory Networks , 2018, EMNLP.

[13]  Ralph Grishman,et al.  Graph Convolutional Networks With Argument-Aware Pooling for Event Detection , 2018, AAAI.

[14]  Xiao Liu,et al.  Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation , 2018, EMNLP.

[15]  Shaobo Liu,et al.  Exploiting Contextual Information via Dynamic Memory Network for Event Detection , 2018, EMNLP.

[16]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[19]  Guntis Barzdins,et al.  Multilingual Clustering of Streaming News , 2018, EMNLP.

[20]  Thien Huu Nguyen,et al.  One for All: Neural Joint Modeling of Entities and Events , 2018, AAAI.

[21]  Fenglong Ma,et al.  Long-Term Memory Networks for Question Answering , 2017, SML@IJCAI.

[22]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[23]  Thien Huu Nguyen,et al.  Similar but not the Same: Word Sense Disambiguation Improves Event Detection via Neural Representation Matching , 2018, EMNLP.