Discovering Hypernymy Relations using Text Layout

Hypernymy relation acquisition has been widely investigated, especially because taxonomies, which often constitute the backbone structure of semantic resources are structured using this type of relations. Although lots of approaches have been dedicated to this task, most of them analyze only the written text. However relations between not necessarily contiguous textual units can be expressed, thanks to typographical or dispositional markers. Such relations, which are out of reach of standard NLP tools, have been investigated in well specified layout contexts. Our aim is to improve the relation extraction task considering both the plain text and the layout. We are proposing here a method which combines layout, discourse and terminological analyses, and performs a structured prediction. We focused on textual structures which correspond to a well defined discourse structure and which often bear hypernymy relations. This type of structure encompasses titles and sub-titles, or enumerative structures. The results achieve a precision of about 60%.

[1]  Gio Wiederhold,et al.  Thesaurus entry extraction from an on-line dictionary , 1999 .

[2]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[3]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[4]  Alessandro Lenci,et al.  Identifying hypernyms in distributional semantic spaces , 2012, *SEMEVAL.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Kentaro Torisawa,et al.  Hacking Wikipedia for Hyponymy Relation Acquisition , 2008, IJCNLP.

[7]  Thierry Hamon,et al.  Improving Term Extraction with Terminological Resources , 2006, FinTAL.

[8]  Béatrice Daille,et al.  Study and Implementation of Combined Techniques for Automatic Extraction of Terminology , 1994 .

[9]  Assaf Urieli,et al.  Robust French syntax analysis: reconciling statistical methods and linguistic knowledge in the Talismane toolkit. (Analyse syntaxique robuste du français : concilier méthodes statistiques et connaissances linguistiques dans l'outil Talismane) , 2013 .

[10]  Gosse Bouma,et al.  Semantic selectional restrictions for disambiguating meronymy relations , 2009 .

[11]  Suresh Manandhar,et al.  Improving an Ontology Refinement Method with Hyponymy Patterns , 2002, LREC.

[12]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[13]  Masaki Murata,et al.  Hypernym Discovery Based on Distributional Similarity and Hierarchical Structures , 2009, EMNLP.

[14]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[15]  Wolfgang Nejdl,et al.  Extracting Semantics Relationships between Wikipedia Categories , 2006, SemWiki.

[16]  Eduard H. Hovy,et al.  Layout-aware text extraction from full-text PDF of scientific articles , 2012, Source Code for Biology and Medicine.

[17]  Nigel Collier,et al.  Building an Annotated Corpus in the Molecular-Biology Domain , 2000, SAIC@COLING.

[18]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[19]  Jong-Hoon Oh,et al.  Bilingual Co-Training for Monolingual Hyponymy-Relation Acquisition , 2009, ACL.

[20]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[21]  Donia Scott,et al.  Document Structure , 2003, CL.

[22]  David Sánchez,et al.  Web-scale taxonomy learning , 2005 .

[23]  John A. Bateman,et al.  Towards Constructive Text, Diagram, and Layout Generation for Information Presentation , 2001, Computational Linguistics.

[24]  Roger Levy,et al.  Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[25]  Alex Lascarides,et al.  Logics of Conversation , 2005, Studies in natural language processing.

[26]  L BergerAdam,et al.  A maximum entropy approach to natural language processing , 1996 .

[27]  Mouna Kamel How can document structure improve ontology learning ? , 2009 .

[28]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[29]  Daniel Marcu,et al.  A Decision-Based Approach to Rhetorical Parsing , 1999, ACL.

[30]  Nathalie Aussenac-Gilles,et al.  Détection automatique de la structure organisationnelle de documents à partir de marqueurs visuels et lexicaux , 2014 .

[31]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[32]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[33]  J. Richard Landis,et al.  Large sample variance of kappa in the case of different sets of raters. , 1979 .