Towards an enhanced and adaptable ontology by distilling and assembling online encyclopedias

In this paper, we investigate the problem of making better use of semantic knowledge obtained from different encyclopedia sources. We propose a framework to integrate different encyclopedias and reorganize the information. We also utilize Learning to Rank models to distill out more functional knowledge from the encyclopedic information and then align the knowledge with a WordNet-like ontology. Finally as a demonstration, a Chinese semantic knowledge repository named JNet is constructed based on this framework. Experiments show that the proposed methods work well and the three steps reinforce each other towards a more powerful ontology.

[1]  Kentaro Torisawa,et al.  Hacking Wikipedia for Hyponymy Relation Acquisition , 2008, IJCNLP.

[2]  George Karypis,et al.  CLUTO - A Clustering Toolkit , 2002 .

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Lidong Bing,et al.  Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning , 2013, WSDM.

[5]  C. E. Rogers,et al.  Symbolic Description of Factorial Models for Analysis of Variance , 1973 .

[6]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[7]  Wolfgang Nejdl,et al.  Extracting Semantics Relationships between Wikipedia Categories , 2006, SemWiki.

[8]  D. Sengupta Linear models , 2003 .

[9]  S. R. Searle Linear Models , 1971 .

[10]  Jaap Kamps,et al.  Entity ranking using Wikipedia as a pivot , 2010, CIKM.

[11]  Xiaohua Wang,et al.  Analysis on the applications of Wikipedia in Chinese information processing , 2011, 2011 International Conference on Multimedia Technology.

[12]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[13]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[14]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[15]  Maria Ruiz-Casado,et al.  Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets , 2005, AWIC.

[16]  Shiwen Yu,et al.  Building a Bilingual WordNet-Like Lexicon: The New Approach and Algorithms , 2002, COLING.

[17]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[18]  Steffen Staab,et al.  Ontologies improve text document clustering , 2003, Third IEEE International Conference on Data Mining.

[19]  Yan Zhang,et al.  CCE: A Chinese Concept Encyclopedia Incorporating the Expert-Edited Chinese Concept Dictionary with Online Cyclopedias , 2011, ADMA.

[20]  James A. Thom,et al.  Entity ranking in Wikipedia , 2007, SAC '08.

[21]  Juliana Freire,et al.  Multilingual Schema Matching for Wikipedia Infoboxes , 2011, Proc. VLDB Endow..

[22]  G. Karypis,et al.  Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .

[23]  Ee-Peng Lim,et al.  Measuring article quality in wikipedia: models and evaluation , 2007, CIKM '07.

[24]  Aurélie Herbelot,et al.  Acquiring Ontological Relationships from Wikipedia Using RMRS , 2006 .

[25]  Jong-Hoon Oh,et al.  Extending WordNet with Hypernyms and Siblings Acquired from Wikipedia , 2011, IJCNLP.

[26]  Yan Zhang,et al.  Ontology enhancement and concept granularity learning: keeping yourself current and adaptive , 2011, KDD.

[27]  Steffen Staab,et al.  WordNet improves text document clustering , 2003, SIGIR 2003.

[28]  Gang Wang,et al.  PORE: Positive-Only Relation Extraction from Wikipedia Text , 2007, ISWC/ASWC.

[29]  Simone Paolo Ponzetto,et al.  Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems , 2010, ACL.