Dependency-Based Text Compression for Semantic Relation Extraction

The application of linguistic patterns and rules are one of the main approaches for Information Extraction as well as for highquality ontology population. However, the lack of flexibility of the linguistic patterns often causes low coverage. This paper presents a weakly-supervised rule-based approach for Relation Extraction which performs partial dependency parsing in order to simplify the linguistic structure of a sentence. This simplification allows us to apply generic semantic extraction rules, obtained with a distantsupervision strategy which takes advantage of semi-structured resources. The rules are added to a partial dependency grammar, which is compiled into a parser capable of extracting instances of the desired relations. Experiments in different Spanish and Portuguese corpora show that this method maintains the highprecision values of rule-based approaches while improves the recall of these systems.

[1]  Mirella Lapata,et al.  Models for Sentence Compression: A Comparison across Domains, Training Requirements and Evaluation Measures , 2006, ACL.

[2]  Juan Manuel Torres Moreno,et al.  La compresión de frases: un recurso para la optimización de resumen automático de documentos , 2010 .

[3]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[4]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[5]  Danushka Bollegala,et al.  Relational duality: unsupervised extraction of semantic relations between entities on the web , 2010, WWW '10.

[6]  Daniel S. Weld,et al.  Learning 5000 Relational Extractors , 2010, ACL.

[7]  Asunción Gómez-Pérez,et al.  Using Linguistic Patterns to Enhance Ontology Development , 2009, KEOD.

[8]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[9]  Nathalie Aussenac-Gilles,et al.  Designing and Evaluating Patterns for Ontology Enrichment from Texts , 2006, EKAW.

[10]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[11]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[12]  Marti A. Hearst Automatic Acquisition of Hyponyms , 1992 .

[13]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[15]  Samuel Reese,et al.  FreeLing 2.1: Five Years of Open-source Language Processing Tools , 2010, LREC.

[16]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[17]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[18]  Pablo Gamallo Otero,et al.  A grammatical formalism based on patterns of part of speech tags , 2011 .

[19]  Mirella Lapata,et al.  Sentence Compression as Tree Transduction , 2009, J. Artif. Intell. Res..

[20]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[21]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.