What kind of knowledge is in Wikipedia? Unsupervised extraction of properties for similar concepts

This article presents a novel method for extracting knowledge from Wikipedia and a classification schema for annotating the extracted knowledge. Unlike the majority of approaches in the literature, we use the raw Wikipedia text for knowledge acquisition. The main assumption made is that the concepts classified under the same node in a taxonomy are described in a comparable way in Wikipedia. The annotation of the extracted knowledge is done at two levels: ontological and logical. The extracted properties are evaluated in the traditional way, that is, by computing the precision of the extraction procedure and in a clustering task. The second method of evaluation is seldom used in the natural language processing community, but it is regularly employed in cognitive psychology.

[1]  Takahiro Hara,et al.  Wikipedia Mining for an Association Web Thesaurus Construction , 2007, WISE.

[2]  Ben He,et al.  Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval , 2012, J. Assoc. Inf. Sci. Technol..

[3]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[4]  Nick Chater,et al.  Understanding Similarity: A Joint Project for Psychology, Case-Based Reasoning, and Law , 1998, Artificial Intelligence Review.

[5]  Mark S. Seidenberg,et al.  Semantic feature production norms for a large set of living and nonliving things , 2005, Behavior research methods.

[6]  G. Karypis,et al.  Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .

[7]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[8]  Aurélie Herbelot,et al.  Acquiring Ontological Relationships from Wikipedia Using RMRS , 2006 .

[9]  Maria Ruiz-Casado,et al.  Automatic Extraction of Semantic Relationships for WordNet by Means of Pattern Learning from Wikipedia , 2005, NLDB.

[10]  D. Medin,et al.  Comments on part I: psychological essentialism , 1989 .

[11]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[12]  Gerhard Weikum,et al.  MENTA: inducing multilingual taxonomies from wikipedia , 2010, CIKM '10.

[13]  Leon D. Segal,et al.  Functions , 1995 .

[14]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[15]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[16]  Massimo Poesio,et al.  Unsupervised Knowledge Extraction for Taxonomies of Concepts from Wikipedia , 2009, RANLP.

[17]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[18]  Edgar A. Whitley,et al.  The Construction of Social Reality , 1999 .

[19]  Pieter E. Vermaas,et al.  Ascribing Functions to Technical Artefacts: A Challenge to Etiological Accounts of Functions , 2003, The British Journal for the Philosophy of Science.

[20]  Gerhard Weikum,et al.  Database and information-retrieval methods for knowledge discovery , 2009, CACM.

[21]  A. Tversky Features of Similarity , 1977 .

[22]  Iryna Gurevych,et al.  Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words , 2009, Natural Language Engineering.

[23]  Marti A. Hearst Automated Discovery of WordNet Relations , 2004 .