Iterative Paraphrastic Augmentation with Discriminative Span Alignment

We introduce a novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment. Our approach allows for the large-scale expansion of existing resources, or the rapid creation of new resources from a small, manually-produced seed corpus. We illustrate our framework on the Berkeley FrameNet Project, a large-scale language understanding effort spanning more than two decades of human labor. Based on roughly four days of collecting training data for the alignment model and approximately one day of parallel compute, we automatically generate 495,300 unique (Frame, Trigger) combinations annotated in context, a roughly 50x expansion atop FrameNet v1.7.

[1]  Chris Callison-Burch,et al.  Semi-Markov Phrase-Based Monolingual Alignment , 2013, EMNLP.

[2]  Jonathan Berant,et al.  Building a Semantic Parser Overnight , 2015, ACL.

[3]  Noah A. Smith,et al.  Learning Joint Semantic Parsers from Disjoint Data , 2018, NAACL.

[4]  Nathan Schneider,et al.  The NLTK FrameNet API: Designing for Discoverability with a Rich Linguistic Resource , 2017, EMNLP.

[5]  Chris Brockett,et al.  Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Noah A. Smith,et al.  SEMAFOR 1.0: A Probabilistic Frame-Semantic Parser , 2010 .

[8]  Noah A. Smith,et al.  Frame-Semantic Parsing , 2014, CL.

[9]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[10]  Matt Post,et al.  ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation , 2019, AAAI.

[11]  Omer Levy,et al.  SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[12]  Jaime G. Carbonell,et al.  Frame-Semantic Role Labeling with Heterogeneous Annotations , 2015, ACL.

[13]  Luke S. Zettlemoyer,et al.  Syntactic Scaffolds for Semantic Structures , 2018, EMNLP.

[14]  Sanjeev Khudanpur,et al.  Audio augmentation for speech recognition , 2015, INTERSPEECH.

[15]  Mirella Lapata,et al.  Constructing Corpora for the Development and Evaluation of Paraphrase Systems , 2008, CL.

[16]  Seung-won Hwang,et al.  Learning with Limited Data for Multilingual Reading Comprehension , 2019, EMNLP.

[17]  Walter Daelemans,et al.  Pattern for Python , 2012, J. Mach. Learn. Res..

[18]  Mirella Lapata,et al.  Using Semantic Roles to Improve Question Answering , 2007, EMNLP.

[19]  Noah A. Smith,et al.  Probabilistic Frame-Semantic Parsing , 2010, NAACL.

[20]  Matt Post,et al.  Large-Scale, Diverse, Paraphrastic Bitexts via Sampling and Clustering , 2019, CoNLL.

[21]  Matt Post,et al.  A Discriminative Neural Model for Cross-Lingual Word Alignment , 2019, EMNLP.

[22]  Kathy McKeown,et al.  Neural Network Alignment for Sentential Paraphrases , 2019, ACL.

[23]  Josef Ruppenhofer,et al.  Semantic frames as an anchor representation for sentiment analysis , 2012, WASSA@ACL.

[24]  Jonathan Berant,et al.  Semantic Parsing via Paraphrasing , 2014, ACL.

[25]  Chris Callison-Burch,et al.  A Lightweight and High Performance Monolingual Word Aligner , 2013, ACL.

[26]  Katrin Erk,et al.  SemEval-2007 Task 19: Frame Semantic Structure Extraction , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[27]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[28]  Jason Weston,et al.  Semantic Frame Identification with Distributed Word Representations , 2014, ACL.

[29]  Jacob Andreas,et al.  Task-Oriented Dialogue as Dataflow Synthesis , 2020, Transactions of the Association for Computational Linguistics.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[32]  Carlos Guestrin,et al.  Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[33]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[34]  Kenton Lee,et al.  A BERT Baseline for the Natural Questions , 2019, ArXiv.

[35]  Noah A. Smith,et al.  Semi-Supervised Frame-Semantic Parsing for Unknown Predicates , 2011, ACL.

[36]  Mirella Lapata,et al.  Paraphrasing Revisited with Neural Machine Translation , 2017, EACL.

[37]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[38]  Kevin Gimpel,et al.  Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations , 2017, ArXiv.

[39]  Mirella Lapata,et al.  Context-aware Frame-Semantic Role Labeling , 2015, Transactions of the Association for Computational Linguistics.

[40]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[41]  Partha Talukdar,et al.  Submodular Optimization-based Diverse Paraphrasing and its Effectiveness in Data Augmentation , 2019, NAACL.

[42]  Aurko Roy,et al.  Unsupervised Paraphrasing without Translation , 2019, ACL.

[43]  Matt Post,et al.  Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation , 2018, NAACL.

[44]  Mark J. F. Gales,et al.  Data augmentation for low resource languages , 2014, INTERSPEECH.

[45]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[47]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[48]  Aljoscha Burchardt,et al.  Approaching Textual Entailment with LFG and FrameNet Frames , 2007 .

[49]  Chris Callison-Burch,et al.  FrameNet+: Fast Paraphrastic Tripling of FrameNet , 2015, ACL.

[50]  Huda Khayrallah,et al.  Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting , 2019, NAACL.

[51]  한국언어학회,et al.  Linguistics in the morning calm, 3 : selected papers from SICOL-1992 , 1982 .

[52]  Rahul Gupta,et al.  A task in a suit and a tie: paraphrase generation with semantic augmentation , 2018, AAAI.

[53]  David McClosky,et al.  Parsing Paraphrases with Joint Inference , 2015, ACL.

[54]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.