Extraction of Semantic Relations between Concepts with KNN Algorithms on Wikipedia

This paper presents methods for extraction of semantic relations be- tween words. The methods rely on the k-nearest neighbor algorithms and two semantic similarity measures to extract relations from the abstracts of Wikipe- dia articles. We analyze the proposed methods and evaluate their performance. Precision of the extraction with the best method achieves 83%. We also present an open source system which effectively implements the described algorithms.

[1]  Alessandro Lenci,et al.  How we BLESSed distributional semantic evaluation , 2011, GEMS.

[2]  James H. Martin,et al.  Speech and Language Processing An Introduction to Natural Language Processing , Computational Linguistics , and Speech Recognition Second Edition , 2008 .

[3]  Alexander Panchenko,et al.  Detection of Child Sexual Abuse Media on P2P Networks: Normalization and Classification of Associated Filenames , 2012 .

[4]  Takahiro Hara,et al.  Wikipedia Mining for an Association Web Thesaurus Construction , 2007, WISE.

[5]  Abdelmajid Ben Hamadou,et al.  Computing semantic relatedness using Wikipedia features , 2013, Knowl. Based Syst..

[6]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[7]  Michael Healy,et al.  Theory and Applications of Ontology: Computer Applications , 2010 .

[8]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[9]  Jae Dong Yang,et al.  Hierarchical text categorization using fuzzy relational thesaurus , 2003, Kybernetika.

[10]  Ian H. Witten,et al.  Mining Domain-Specific Thesauri from Wikipedia: A Case Study , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[11]  James R. Curran,et al.  Improvements in Automatic Thesaurus Extraction , 2002, ACL 2002.

[12]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[13]  Yves Peirsman,et al.  Modelling Word Similarity: an Evaluation of Automatic Synonymy Extraction Algorithms , 2008, LREC.

[14]  Iryna Gurevych,et al.  Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary , 2008, LREC.

[15]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[16]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[17]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[18]  Joakim Nivre,et al.  Benchmarking of Statistical Dependency Parsers for French , 2010, COLING.

[19]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[20]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[21]  Hsin-Hsi Chen,et al.  Query Expansion with ConceptNet and WordNet: An Intrinsic Comparison , 2006, AIRS.

[22]  Vincent D. Blondel,et al.  Automatic Discovery of SimilarWords , 2008 .

[23]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.