Extraction of Relation Descriptors for Portuguese Using Conditional Random Fields

An important task in Information Extraction is Relation Extraction. Relation Extraction (RE) is the task of detecting and characterizing the semantic relations between entities in the text. This work proposes a new process for the extraction of any relation descriptors between Named Entities (NEs) in the Organization domain, for the Portuguese language, using the Conditional Random Fields (CRF) model. For example, from the following sentence fragment “Microsoft headquartered in Redmond, […]”, we can extract the relation descriptor “headquartered-in”, that relates the NEs “Microsoft” and “Redmond”. We evaluated different features configurations for CRF; the best results were obtained with the inclusion of the semantic feature based on the NE category, since this feature could express, in a better way, the kind of relationship between the pair of NEs we want to identify. The proposed process achieved F-measure rates of 45 % and 53 %, considering the extraction of complete and partial matching, respectively.

[1]  Luis Gravano,et al.  Extracting Relations from Large Plain-Text Collections , 1999 .

[2]  Rui Silva,et al.  Extracção de Relações Semânticas de Textos em Português Explorando a DBpédia e a Wikipédia , 2013, Linguamática.

[3]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[4]  Sunita Sarawagi,et al.  Information Extraction , 2008 .

[5]  Qinghua Zheng,et al.  Knowledge element relation extraction using conditional random fields , 2010, The 2010 14th International Conference on Computer Supported Cooperative Work in Design.

[6]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[7]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[8]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Ang Sun A Two-stage Bootstrapping Algorithm for Relation Extraction , 2009, RANLP.

[11]  Sandra Collovini,et al.  A review on Relation Extraction with an eye on Portuguese , 2013, Journal of the Brazilian Computer Society.

[12]  Douglas E. Appelt,et al.  FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text , 1997, ArXiv.

[13]  Andrew McCallum,et al.  Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text , 2006, NAACL.

[14]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[15]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[16]  Thiago Alexandre Salgueiro Pardo,et al.  Computational Processing of the Portuguese Language - 11th International Conference, PROPOR 2014, São Carlos/SP, Brazil, October 6-8, 2014. Proceedings , 2014, Lecture Notes in Computer Science.

[17]  Hugo Gonçalo Oliveira,et al.  Relações semânticas do ReRelEM: além das entidades no Segundo HAREM , 2009 .

[18]  Yaliang Li,et al.  Extracting Relation Descriptors with Conditional Random Fields , 2011, IJCNLP.

[19]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[20]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[21]  Helena de Medeiros Caseli,et al.  Automatic Hyponymy Identification from Brazilian Portuguese Texts , 2012, PROPOR.

[22]  Nuno Cardoso REMBRANDT - Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto , 2009 .

[23]  João Paulo da Silva Cunha,et al.  Extracção de Informação de Relatórios Médicos , 2009, Linguamática.

[24]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[25]  Gottfried Vossen,et al.  The World Wide Web and Databases , 2001, Lecture Notes in Computer Science.

[26]  Daniel S. Weld,et al.  Fine-Grained Entity Recognition , 2012, AAAI.

[27]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[28]  Emmanuel Roche,et al.  Finite-State Language Processing , 1997 .

[29]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[30]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[31]  Jorge Baptista,et al.  Extraction of Family Relations between Entities , 2010 .

[32]  Marcirio Silveira Chaves Geo-ontologias e padrões para reconhecimento de locais e de suas relações em textos: o SEI-Geo no Segundo HAREM , 2008 .

[33]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[34]  Robert V. Hogg,et al.  Introduction to Mathematical Statistics. , 1966 .

[35]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.