PeerDA: Data Augmentation via Modeling Peer Relation for Span Identification Tasks

Span identification aims at identifying specific text spans from text input and classifying them into pre-defined categories. Different from previous works that merely leverage the Subordinate (SUB) relation (i.e. if a span is an instance of a certain category) to train models, this paper for the first time explores the Peer (PR) relation, which indicates that two spans are instances of the same category and share similar features. Specifically, a novel Peer Data Augmentation (PeerDA) approach is proposed which employs span pairs with the PR relation as the augmentation data for training. PeerDA has two unique advantages: (1) There are a large number of PR span pairs for augmenting the training data. (2) The augmented data can prevent the trained model from over-fitting the superficial span-category mapping by pushing the model to leverage the span semantics. Experimental results on ten datasets over four diverse tasks across seven domains demonstrate the effectiveness of PeerDA. Notably, PeerDA achieves state-of-the-art results on six of them.

[1]  H. Ng,et al.  Class-Adaptive Self-Training for Relation Extraction with Incompletely Annotated Training Data , 2023, ACL.

[2]  Wai Lam,et al.  mPMR: A Multilingual Pre-trained Machine Reader at Scale , 2023, ACL.

[3]  E. Cambria,et al.  ConNER: Consistency Training for Cross-lingual Named Entity Recognition , 2022, EMNLP.

[4]  Tat-Seng Chua,et al.  ConReader: Exploring Implicit Relations in Contracts for Contract Clause Extraction , 2022, EMNLP.

[5]  Enwei Zhu,et al.  Boundary Smoothing for Named Entity Recognition , 2022, ACL.

[6]  Lidong Bing,et al.  A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges , 2022, IEEE Transactions on Knowledge and Data Engineering.

[7]  Donghong Ji,et al.  Unified Named Entity Recognition as Word-Word Relation Classification , 2021, AAAI.

[8]  Xingwu Liu,et al.  Enhanced Language Representation with Label Knowledge for Span Extraction , 2021, EMNLP.

[9]  Bill Yuchen Lin,et al.  RockNER: A Simple Method to Create Adversarial Examples for Evaluating the Robustness of Named Entity Recognition Models , 2021, EMNLP.

[10]  Leonardo Neves,et al.  Data Augmentation for Cross-Domain Named Entity Recognition , 2021, EMNLP.

[11]  Lei Li,et al.  LightNER: A Lightweight Tuning Paradigm for Low-resource NER via Pluggable Prompting , 2021, COLING.

[12]  E. Cambria,et al.  MELM: Data Augmentation with Masked Entity Language Modeling for Low-Resource NER , 2021, ACL.

[13]  Lidong Bing,et al.  Towards Generative Aspect-Based Sentiment Analysis , 2021, ACL.

[14]  Emmanuele Chersoni,et al.  Lexical data augmentation for sentiment analysis , 2021, J. Assoc. Inf. Sci. Technol..

[15]  Xipeng Qiu,et al.  A Unified Generative Framework for Aspect-based Sentiment Analysis , 2021, ACL.

[16]  Xipeng Qiu,et al.  A Unified Generative Framework for Various NER Subtasks , 2021, ACL.

[17]  Radhika Mamidi,et al.  Volta at SemEval-2021 Task 6: Towards Detecting Persuasive Texts and Images using Textual and Multimodal Ensemble , 2021, SEMEVAL.

[18]  Binling Nie,et al.  Knowledge-aware Named Entity Recognition with Alleviating Heterogeneity , 2021, AAAI.

[19]  Haitao Zheng,et al.  Few-NERD: A Few-shot Named Entity Recognition Dataset , 2021, ACL.

[20]  Fei Huang,et al.  Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning , 2021, ACL.

[21]  Eduard Hovy,et al.  A Survey of Data Augmentation Approaches for NLP , 2021, FINDINGS.

[22]  Firoj Alam,et al.  SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images , 2021, SEMEVAL.

[23]  Kang Min Yoo,et al.  GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation , 2021, EMNLP.

[24]  Wei Lu,et al.  Better Feature Integration for Named Entity Recognition , 2021, NAACL.

[25]  Dan Hendrycks,et al.  CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review , 2021, NeurIPS Datasets and Benchmarks.

[26]  Yi Shen,et al.  A Joint Training Dual-MRC Framework for Aspect Based Sentiment Analysis , 2021, AAAI.

[27]  Jian Liu,et al.  Event Extraction as Machine Reading Comprehension , 2020, EMNLP.

[28]  Luo Si,et al.  APE: Argument Pair Extraction from Peer Review and Rebuttal via Multi-task Learning , 2020, EMNLP.

[29]  Linlin Liu,et al.  DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks , 2020, EMNLP.

[30]  Xiang Wan,et al.  Named Entity Recognition for Social Media Texts with Semantic Augmentation , 2020, EMNLP.

[31]  Wai Lam,et al.  Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond , 2020, ArXiv.

[32]  Heike Adel,et al.  An Analysis of Simple Data Augmentation for Named Entity Recognition , 2020, COLING.

[33]  Ahmed Hassan Awadallah,et al.  Adaptive Self-training for Few-shot Neural Sequence Labeling , 2020, ArXiv.

[34]  Roman Klinger,et al.  Dissecting Span Identification Tasks with Performance Prediction , 2020, EMNLP.

[35]  Xuanjing Huang,et al.  Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis , 2020, EMNLP.

[36]  Giovanni Da San Martino,et al.  SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles , 2020, SEMEVAL.

[37]  Tieyun Qian,et al.  Relation-Aware Collaborative Learning for Unified Aspect-Based Sentiment Analysis , 2020, ACL.

[38]  Lidong Bing,et al.  Improving Low-Resource Named Entity Recognition using Joint Sentence and Token Labeling , 2020, ACL.

[39]  Jianfeng Gao,et al.  DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.

[40]  Qing Ling,et al.  Conditional Augmentation for Aspect Term Extraction via Masked Sequence-to-Sequence Generation , 2020, ACL.

[41]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jiwei Li,et al.  A Unified MRC Framework for Named Entity Recognition , 2019, ACL.

[43]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[44]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[45]  Preslav Nakov,et al.  Fine-Grained Analysis of Propaganda in News Article , 2019, EMNLP.

[46]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[47]  Zachary C. Lipton,et al.  Entity Projection via Machine Translation for Cross-Lingual NER , 2019, EMNLP.

[48]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[49]  Hwee Tou Ng,et al.  An Interactive Multi-Task Learning Network for End-to-End Aspect-Based Sentiment Analysis , 2019, ACL.

[50]  Zhen Huang,et al.  Open-Domain Targeted Sentiment Analysis via Span-Based Extraction and Classification , 2019, ACL.

[51]  Mingxin Zhou,et al.  Entity-Relation Extraction as Multi-Turn Question Answering , 2019, ACL.

[52]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[53]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[54]  Xing Wu,et al.  Conditional BERT Contextual Augmentation , 2018, ICCS.

[55]  Xin Li,et al.  A Unified Model for Opinion Target Extraction and Target Sentiment Prediction , 2018, AAAI.

[56]  Richard Socher,et al.  The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.

[57]  Sosuke Kobayashi,et al.  Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations , 2018, NAACL.

[58]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[59]  Leon Derczynski,et al.  Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition , 2017, NUT@EMNLP.

[60]  Ion Androutsopoulos,et al.  Extracting contract elements , 2017, ICAIL.

[61]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[62]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[63]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[64]  Nanyun Peng,et al.  Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings , 2015, EMNLP.

[65]  Lidong Bing,et al.  Improving Distant Supervision for Information Extraction Using Label Propagation Through Lists , 2015, EMNLP.

[66]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[67]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[68]  Suresh Manandhar,et al.  SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[69]  James R. Glass,et al.  Query understanding enhanced by hierarchical parsing structures , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[70]  Hwee Tou Ng,et al.  Towards Robust Linguistic Analysis using OntoNotes , 2013, CoNLL.

[71]  James R. Glass,et al.  Asgard: A portable architecture for multilingual dialogue systems , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[72]  Lidong Bing,et al.  Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning , 2013, WSDM.

[73]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[74]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity Through Ranking , 2009, J. Mach. Learn. Res..

[75]  Meng Zhou,et al.  From Clozing to Comprehending: Retrofitting Pre-trained Language Model to Pre-trained Machine Reader , 2022, ArXiv.

[76]  Konrad Kaczynski,et al.  HOMADOS at SemEval-2021 Task 6: Multi-Task Learning for Propaganda Detection , 2021, SEMEVAL.

[77]  Yang Mo,et al.  FPAI at SemEval-2021 Task 6: BERT-MRC for Propaganda Techniques Detection , 2021, SEMEVAL.

[78]  Baolin Peng,et al.  Few-Shot Named Entity Recognition: An Empirical Baseline Study , 2021, EMNLP.

[79]  Jinan Xu,et al.  Machine Reading Comprehension as Data Augmentation: A Case Study on Implicit Event Argument Extraction , 2021, EMNLP.

[80]  Lidong Bing,et al.  Aspect-based Sentiment Analysis in Question Answering Forums , 2021, EMNLP.

[81]  Chunyan Miao,et al.  MulDA: A Multilingual Data Augmentation Framework for Low-Resource Cross-Lingual NER , 2021, ACL.

[82]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[83]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .