A Distant Supervision Approach to Semantic Role Labeling

Semanticrolelabelinghasbecomeakeymodule for many language processing applications such as question answering, information extraction, sentiment analysis, and machine translation. To build an unrestricted semantic role labeler, the first step is to develop a comprehensive proposition bank. However, creating such a bank is a costly enterprise, which has only been achieved for a handful of languages. In this paper, we describe a technique to build proposition banks for new languages using distant supervision. Starting from PropBank inEnglishandlooselyparallelcorporasuchas versions of Wikipedia in different languages, we carried out a mapping of semantic propositions we extracted from English to syntactic structures in Swedish using named entities. We trained a semantic parser on the generated Swedishpropositionsandwereporttheresults we obtained. Using the CoNLL 2009 evaluation script, we could reach the scores of 52.25 for labeled propositions and 62.44 for the unlabeled ones. We believe our approach can be appliedtotrainsemanticrolelabelersforother resource-scarce languages.

[1]  Mirella Lapata,et al.  Using Semantic Roles to Improve Question Answering , 2007, EMNLP.

[2]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[3]  Pierre Nugues,et al.  KOSHIK- A Large-scale Distributed Computing Framework for NLP , 2014, ICPRAM.

[4]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[5]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[6]  Richard Johansson,et al.  Dependency-based Syntactic–Semantic Analysis with PropBank and NomBank , 2008, CoNLL.

[7]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[8]  Richard Johansson,et al.  Extracting Opinion Expressions and Their Polarities - Exploration of Pipelines and Joint Models , 2011, ACL.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[11]  Daniel S. Weld,et al.  Learning 5000 Relational Extractors , 2010, ACL.

[12]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[13]  Mirella Lapata,et al.  Cross-lingual Annotation Projection for Semantic Roles , 2009, J. Artif. Intell. Res..

[14]  Ramesh Nallapati,et al.  Multi-instance Multi-label Learning for Relation Extraction , 2012, EMNLP.

[15]  Pascale Fung,et al.  Lexical Semantics for Statistical Machine Translation , 2011 .

[16]  Oren Etzioni,et al.  Modeling Missing Data in Distant Supervision for Information Extraction , 2013, TACL.

[17]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[18]  Pierre Nugues,et al.  A High-Performance Syntactic and Semantic Dependency Parser , 2010, COLING.

[19]  Pierre Nugues,et al.  Multilingual Semantic Role Labeling , 2009, CoNLL Shared Task.

[20]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[21]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[22]  Robert Östling,et al.  Stagger: an Open-Source Part of Speech Tagger for Swedish , 2013 .

[23]  Ding Liu,et al.  Semantic Role Features for Machine Translation , 2010, COLING.

[24]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[25]  Oren Etzioni,et al.  Semantic Role Labeling for Open Information Extraction , 2010, HLT-NAACL 2010.

[26]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.