L2F/INESC-ID at SemEval-2019 Task 2: Unsupervised Lexical Semantic Frame Induction using Contextualized Word Representations

Building large datasets annotated with semantic information, such as FrameNet, is an expensive process. Consequently, such resources are unavailable for many languages and specific domains. This problem can be alleviated by using unsupervised approaches to induce the frames evoked by a collection of documents. That is the objective of the second task of SemEval 2019, which comprises three subtasks: clustering of verbs that evoke the same frame and clustering of arguments into both frame-specific slots and semantic roles. We approach all the subtasks by applying a graph clustering algorithm on contextualized embedding representations of the verbs and arguments. Using such representations is appropriate in the context of this task, since they provide cues for word-sense disambiguation. Thus, they can be used to identify different frames evoked by the same words. Using this approach we were able to outperform all of the baselines reported for the task on the test set in terms of Purity F1, as well as in terms of BCubed F1 in most cases.

[1]  Laura Kallmeyer,et al.  SemEval-2019 Task 2: Unsupervised Lexical Frame Induction , 2019, *SEMEVAL.

[2]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[3]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[4]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[5]  Ido Dagan,et al.  Generating Entailment Rules from FrameNet , 2010, ACL.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[10]  Hans C. Boas,et al.  Multilingual FrameNets in computational lexicography : methods and applications , 2009 .

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Coarse Lexical Frame Acquisition at the Syntax-Semantics Interface Using a Latent-Variable PCFG Model , 2018, *SEM@NAACL-HLT.

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  Noah A. Smith,et al.  Frame-Semantic Parsing , 2014, CL.

[15]  Christian Biemann,et al.  Unsupervised Semantic Frame Induction using Triclustering , 2018, ACL.

[16]  C. Fillmore FRAME SEMANTICS AND THE NATURE OF LANGUAGE * , 1976 .

[17]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[18]  Christian Biemann,et al.  Watset: Automatic Induction of Synsets from a Graph of Synonyms , 2017, ACL.

[19]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[20]  Ivan Titov,et al.  Unsupervised Induction of Semantic Roles within a Reconstruction-Error Minimization Framework , 2014, NAACL.

[21]  Mirella Lapata,et al.  Using Semantic Roles to Improve Question Answering , 2007, EMNLP.

[22]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[23]  Mirella Lapata,et al.  Similarity-Driven Semantic Role Induction via Graph Partitioning , 2014, CL.

[24]  Ivan Titov,et al.  A Bayesian Approach to Unsupervised Semantic Role Induction , 2012, EACL.

[25]  Jirí Materna,et al.  LDA-Frames: An Unsupervised Approach to Generating Semantic Frames , 2012, CICLing.

[26]  Christian Biemann,et al.  Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems , 2006 .