Extracting hypernym relations from Wikipedia disambiguation pages : comparing symbolic and machine learning approaches

Extracting hypernym relations from text is one of the key steps in the construction and enrichment of semantic resources. Several methods have been exploited in a variety of propositions in the literature. However, the strengths of each approach on a same corpus are still poorly identified in order to better take advantage of their complementarity. In this paper, we study how complementary two approaches of different nature are when identifying hypernym relations on a structured corpus containing both well-written text and syntactically poor formulations, together with a rich formatting. A symbolic approach based on lexico-syntactic patterns and a statistical approach using a supervised learning method are applied to a sub-corpus of Wikipedia in French, composed of disambiguation pages. These pages, particularly rich in hypernym relations, contain both kinks of formulations. We compared the results of each approach independently of each other and compared the performance when combining together their individual results. We obtain the best results in the latter case, with an F-measure of 0.75. In addition, 55% of the relations identified by our approach, with respect to a reference corpus, are not expressed in the French DBPedia and could be used to enrich this resource.

[1]  Adel Ghamnia Extraction de relations d'hyperonymie à partir de Wikipédia , 2016 .

[2]  Jens Lehmann,et al.  DBpedia and the live extraction of structured data from Wikipedia , 2012, Program.

[3]  Oren Etzioni,et al.  TextRunner: Open Information Extraction on the Web , 2007, NAACL.

[4]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[5]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[6]  Ido Dagan,et al.  Directional distributional similarity for lexical inference , 2010, Natural Language Engineering.

[7]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[8]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[9]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[10]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[11]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[12]  Omer Levy,et al.  Do Supervised Distributional Methods Really Learn Lexical Inference Relations? , 2015, NAACL.

[13]  Christian Jacquemin,et al.  Automatic Acquisition and Expansion of Hypernym Links , 2004, Comput. Humanit..

[14]  David J. Weir,et al.  Learning to Distinguish Hypernyms and Co-Hyponyms , 2014, COLING.

[15]  Qin Lu,et al.  Chasing Hypernyms in Vector Spaces with Entropy , 2014, EACL.

[16]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[17]  Pierre Zweigenbaum,et al.  A Hybrid Approach for the Extraction of Semantic Relations from MEDLINE Abstracts , 2011, CICLing.

[18]  Nathalie Aussenac-Gilles,et al.  Variabilité des performances des outils de TAL et genre textuel , 2006, Trait. Autom. des Langues.

[19]  Paul Buitelaar,et al.  Ontology Learning from Text: An Overview , 2005 .

[20]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[21]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[22]  Pierre Zweigenbaum,et al.  Detecting Semantic Relations between Terms in Definitions , 2004 .

[23]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[24]  Ludovic Tanguy,et al.  Repérage automatique de structures linguistiques en corpus : le cas des énoncés définitoires , 2000 .

[25]  T. V. D. Cruys,et al.  Présentation de l'atelier SemDis 2014 : sémantique distributionnelle pour la substitution lexicale et l'exploration de corpus spécialisés , 2014 .

[26]  Raquel Hervás,et al.  Improving Information Extraction from Wikipedia Texts using Basic English , 2016, LREC.

[27]  Roger Leitzke Granada,et al.  Evaluation of methods for taxonomic relation extraction from text , 2015 .

[28]  P. Séguéla,et al.  Extraction de relations sémantiques entre termes et enrichissement de modèles du domaine , 1999 .

[29]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[30]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[31]  Paola Velardi,et al.  Learning Word-Class Lattices for Definition and Hypernym Extraction , 2010, ACL.

[32]  Patrick Pantel,et al.  Automatically Harvesting and Ontologizing Semantic Relations , 2008, Ontology Learning and Population.

[33]  Timothy Baldwin,et al.  Experiments on pattern-based relation learning , 2009, CIKM.

[34]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[35]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[36]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[37]  Alessandro Lenci,et al.  Identifying hypernyms in distributional semantic spaces , 2012, *SEMEVAL.

[38]  Cássia Trojahn,et al.  Exploiter la structure discursive du texte pour valider les relations candidates d'hyperonymie issues de structures énumératives parallèles , 2016, IC.

[39]  Kentaro Torisawa,et al.  Hacking Wikipedia for Hyponymy Relation Acquisition , 2008, IJCNLP.

[40]  Fabian M. Suchanek,et al.  Yago: A Core of Semantic Knowledge Unifying WordNet and Wikipedia , 2007 .