Towards Enriching DBpedia from Vertical Enumerative Structures Using a Distant Learning Approach

Automatic construction of semantic resources at large scale usually relies on general purpose corpora as Wikipedia. This resource, by nature rich in encyclopedic knowledge, exposes part of this knowledge with strongly structured elements (infoboxes, categories, etc.). Several extractors have targeted these structures in order to enrich or to populate semantic resources as DBpedia, YAGO or BabelNet. The remain semi-structured textual structures, such as vertical enumerative structures (those using typographic and dispositional layout) have been however under-exploited. However, frequent in corpora, they are rich sources of specific semantic relations, such as hypernyms. This paper presents a distant learning approach for extracting hypernym relations from vertical enumerative structures of Wikipedia, with the aim of enriching DBpedia. Our relation extraction approach achieves an overall precision of 62%, and 99% of the extracted relations can enrich DBpedia, with respect to a reference corpus.

[1]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[2]  Aoying Zhou,et al.  A Short Survey on Taxonomy Learning from Text Corpora: Issues, Resources and Recent Advances , 2017, EMNLP.

[3]  Ludovic Tanguy,et al.  Anatomie des structures énumératives , 2010 .

[4]  Tiziano Flati,et al.  MultiWiBi: The multilingual Wikipedia bitaxonomy project , 2016, Artif. Intell..

[5]  Christophe Luc Representation et composition des structures visuelles et rhetoriques du texte. Approche pour la generation de textes formates , 2000 .

[6]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[7]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[8]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[9]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[10]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[11]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[12]  Wolfgang Wahlster,et al.  Readings in Intelligent User Interfaces , 1998 .

[13]  Jens Lehmann,et al.  DBpedia and the live extraction of structured data from Wikipedia , 2012, Program.

[14]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[15]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[16]  Paola Velardi,et al.  Learning Word-Class Lattices for Definition and Hypernym Extraction , 2010, ACL.

[17]  Nicholas Asher,et al.  Reference to abstract objects in discourse , 1993, Studies in linguistics and philosophy.

[18]  Laurent Prévot,et al.  Interleaved discourse, the case of two-step enumerative structures , 2008 .

[19]  Mouna Kamel,et al.  Discovering Hypernymy Relations using Text Layout , 2015, *SEM@NAACL-HLT.

[20]  Nathalie Aussenac-Gilles,et al.  A Distant Learning Approach for Extracting Hypernym Relations from Wikipedia Disambiguation Pages , 2017, KES.

[21]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[22]  Alessandro Lenci,et al.  Identifying hypernyms in distributional semantic spaces , 2012, *SEMEVAL.

[23]  Jean-Philippe Fauconnier,et al.  Une typologie multi-dimensionnelle des structures énumératives pour l'identification des relations termino-ontologiques , 2013 .

[24]  Eduard H. Hovy,et al.  Automatic Generation of Formatted Text , 1991, AAAI.

[25]  Raquel Hervás,et al.  Improving Information Extraction from Wikipedia Texts using Basic English , 2016, LREC.

[26]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[27]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[28]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[29]  Razvan C. Bunescu,et al.  Learning to Extract Relations from the Web using Minimal Supervision , 2007, ACL.

[30]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[31]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[32]  Kentaro Torisawa,et al.  Hacking Wikipedia for Hyponymy Relation Acquisition , 2008, IJCNLP.