Linked hypernyms: Enriching DBpedia with Targeted Hypernym Discovery

The Linked Hypernyms Dataset (LHD) provides entities described by Dutch, English and German Wikipedia articles with types in the DBpedia namespace. The types are extracted from the first sentences of Wikipedia articles using Hearst pattern matching over part-of-speech annotated text and disambiguated to DBpedia concepts. The dataset covers 1.3 million RDF type triples from English Wikipedia, out of which 1 million RDF type triples were found not to overlap with DBpedia, and 0.4 million with YAGO2s. There are about 770 thousand German and 650 thousand Dutch Wikipedia entities assigned a novel type, which exceeds the number of entities in the localized DBpedia for the respective language. RDF type triples from the German dataset have been incorporated to the German DBpedia. Quality assessment was performed altogether based on 16.500 human ratings and annotations. For the English dataset, the average accuracy is 0.86, for German 0.77 and for Dutch 0.88. The accuracy of raw plain text hypernyms exceeds 0.90 for all languages. The LHD release described and evaluated in this article targets DBpedia 3.8, LHD version for the DBpedia 3.9 containing approximately 4.5 million RDF type triples is also available.

[1]  Johanna Völker,et al.  A Framework for Ontology Learning and Data-driven Change Discovery , 2005 .

[2]  Rainer Malaka,et al.  Sequential Supervised Learning for Hypernym Discovery from Wikipedia , 2009, IC3K.

[3]  Ondrej Sváb-Zamazal,et al.  Towards Linked Hypernyms Dataset 2.0: complementing DBpedia with hypernym discovery , 2014, LREC.

[4]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[5]  Ebroul Izquierdo,et al.  Combining image captions and visual analysis for image concept classification , 2008, MDM '08.

[6]  Claudio Giuliano,et al.  Automatic Expansion of DBpedia Exploiting Wikipedia Cross-Language Information , 2013, ESWC.

[7]  Oren Etzioni,et al.  What Is This, Anyway: Automatic Hypernym Discovery , 2009, AAAI Spring Symposium: Learning by Reading and Learning to Read.

[8]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[9]  Yorick Wilks,et al.  Book Reviews: Electric Words: Dictionaries, Computers, and Meanings , 1996, CL.

[10]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[11]  Milan Dojchinovski,et al.  Entityclassifier.eu: Real-Time Classification of Entities in Text with Wikipedia , 2013, ECML/PKDD.

[12]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[13]  Nicoletta Calzolari,et al.  Detecting Patterns in a Lexical Data Base , 1984, ACL.

[14]  Heiko Paulheim Browsing Linked Open Data with Auto Complete , 2012 .

[15]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[16]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[17]  Aldo Gangemi,et al.  A Comparison of Knowledge Extraction Tools for the Semantic Web , 2013, ESWC.

[18]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[19]  Ondrej Sváb-Zamazal,et al.  Wikipedia Search as Effective Entity Linking Algorithm , 2013, TAC.

[20]  Andrea Giovanni Nuzzolese,et al.  Automatic Typing of DBpedia Entities , 2012, SEMWEB.