Extracting Meta Statements from the Blogosphere

Information extraction systems have been recently proposed for organizing and exploring content in large online text corpora as information networks . In such networks, the nodes are named entities (e.g., people, organizations) while the edges correspond to statements indicating relations among such entities. To date, such systems extract rather primitive networks, capturing only those relations which are expressed by direct statements. In many applications, it is useful to also extract more subtle relations which are often expressed as meta statements in the text. These can, for instance provide the context for a statement (e.g., “Google acquired YouTube on October 2006”), or repercussion about a statement (e.g., “The US condemned Russia’s invasion of Georgia”). In this work, we report on a system for extracting relations expressed in both direct statements as well as in meta statements. We propose a method based on Conditional Random Fields that explores syntactic features to extract both kinds of statements seamlessly. We follow the Open Information Extraction paradigm, where a classifier is trained to recognize any type of relation instead of specific ones. Finally, our results show substantial improvements over a state-of-the-art information extraction system, both in terms of accuracy and, especially, recall.

[1]  Fei Xia,et al.  Converting Dependency Structures to Phrase Structures , 2001, HLT.

[2]  Andrew McCallum,et al.  Learning Field Compatibilities to Extract Database Records from Unstructured Text , 2006, EMNLP.

[3]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[4]  David Yarowsky,et al.  HLTCOE Approaches to Knowledge Base Population at TAC 2009 , 2009, TAC.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  J. Brodsky A Part of Speech , 1977 .

[7]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[8]  Denilson Barbosa,et al.  Extracting Information Networks from the Blogosphere: State-of-the-Art and Challenges , 2010 .

[9]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[10]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[11]  Yang Jin,et al.  Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE , 2005, ACL.

[12]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[13]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[14]  Bo Zhang,et al.  StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[15]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[16]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.

[17]  Danushka Bollegala,et al.  Relational duality: unsupervised extraction of semantic relations between entities on the web , 2010, WWW '10.

[18]  Kalina Bontcheva,et al.  Shallow Methods for Named Entity Coreference Resolution , 2002 .

[19]  Akshay Java,et al.  The ICWSM 2009 Spinn3r Dataset , 2009 .

[20]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[21]  Hans Uszkoreit,et al.  A Seed-driven Bottom-up Machine Learning Framework for Extracting Relations of Various Complexity , 2007, ACL.

[22]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[23]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[24]  Michael Kifer,et al.  Reasoning about Anonymous Resources and Meta Statements on the Semantic Web , 2003, J. Data Semant..

[25]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[26]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[27]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.