A Category-Driven Approach to Deriving Domain Specific Subset of Wikipedia

While many researchers attempt to build up different kinds of ontologies by means of Wikipedia, the possibility of deriving high-quality domain specific subset of Wikipedia using its own category structure still remains undervalued. We prove the necessity of such processing in this paper and also propose an appropriate technique. As a result, the size of knowledge base for our text processing framework has been reduced by more than order, while the precision of disambiguating musical metadata (ID3 tags) has decreased from 98% to 64%.

[1]  Qin Lu,et al.  Mining Concepts from Wikipedia for Ontology Construction , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[2]  Philipp Cimiano,et al.  Ontology Learning from Text: Methods, Evaluation and Applications , 2005 .

[3]  Ortega Soto,et al.  Wikipedia: A quantitative analysis , 2012 .

[4]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[5]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[6]  M. Migliore,et al.  An algorithm to find all paths between two nodes in a graph , 1990 .

[7]  Timothy W. Finin,et al.  Wikipedia as an Ontology for Describing Documents , 2008, ICWSM.

[8]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[9]  Pavel Velikhov,et al.  Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation , 2008, SYRCoDIS.

[10]  L. Thorelli An algorithm for computing all paths in a graph , 1966 .

[11]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[12]  Takahiro Hara,et al.  Concept vector extraction from Wikipedia category network , 2009, ICUIMC '09.

[13]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[14]  Mining a Large-Scale Term-Concept Network from Wikipedia , 2006 .

[15]  Qin Lu,et al.  Corpus Exploitation from Wikipedia for Ontology Construction , 2008, LREC.

[16]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[17]  Iryna Gurevych,et al.  Analysis of the Wikipedia Category Graph for NLP Applications , 2007 .

[18]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[19]  Michael Strube,et al.  Distinguishing between Instances and Classes in the Wikipedia Taxonomy , 2008, ESWC.

[20]  Wolfgang Nejdl,et al.  Extracting Semantics Relationships between Wikipedia Categories , 2006, SemWiki.

[21]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[22]  Ricardo Simões,et al.  APAC: An exact algorithm for retrieving cycles and paths in all kinds of graphs , 2009 .