Self-Supervised Chinese Ontology Learning from Online Encyclopedias

Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.

[1]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[2]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[3]  Kentaro Torisawa,et al.  Acquiring Hyponymy Relations from Web Documents , 2004, NAACL.

[4]  Juan-Zi Li,et al.  Cross-lingual knowledge linking across wiki knowledge bases , 2012, WWW.

[5]  José Maria Parente de Oliveira,et al.  Concept maps as the first step in an ontology construction method , 2013, Inf. Syst..

[6]  Fuji Ren,et al.  A Practical System of Domain Ontology Learning Using the Web for Chinese , 2009, 2009 Fourth International Conference on Internet and Web Applications and Services.

[7]  Steffen Staab,et al.  The TEXT-TO-ONTO Ontology Learning Environment , 2000 .

[8]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[9]  ChengXiang Zhai,et al.  A Systematic Exploration of the Feature Space for Relation Extraction , 2007, NAACL.

[10]  Nanning Zheng,et al.  Emotion Ontology Construction from Chinese Knowledge , 2012, CICLing.

[11]  Yau-Hwang Kuo,et al.  Automated ontology construction for unstructured text documents , 2007, Data & Knowledge Engineering.

[12]  Jeff Z. Pan,et al.  Building a Large Scale Knowledge Base from Chinese Wiki Encyclopedia , 2011, JIST.

[13]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[14]  Alexander V. Smirnov,et al.  An approach to automated construction of a general-purpose lexical ontology based on Wiktionary , 2013 .

[15]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[16]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[17]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[18]  Michael Strube,et al.  Transforming Wikipedia into a large scale multilingual concept network , 2013, Artif. Intell..

[19]  Mining a Large-Scale Term-Concept Network from Wikipedia , 2006 .

[20]  Yan Zhang,et al.  CCE: A Chinese Concept Encyclopedia Incorporating the Expert-Edited Chinese Concept Dictionary with Online Cyclopedias , 2011, ADMA.

[21]  Daniel S. Weld,et al.  Automatically refining the wikipedia infobox ontology , 2008, WWW.

[22]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[23]  Xiaoxin Yin,et al.  Building taxonomy of web search intents for name entity queries , 2010, WWW '10.

[24]  D. Sánchez,et al.  Creating Ontologies from Web documents , 2004 .

[25]  Mehrnoush Shamsfard,et al.  Learning ontologies from natural language texts , 2004, Int. J. Hum. Comput. Stud..

[26]  Markus Krötzsch,et al.  Semantic Wikipedia , 2006, WikiSym '06.

[27]  Qin Lu,et al.  Chinese Core Ontology Construction from a Bilingual Term Bank , 2008, LREC.

[28]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[29]  Michael J. Witbrock,et al.  An Introduction to the Syntax and Content of Cyc , 2006, AAAI Spring Symposium: Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering.

[30]  Yan Zhang,et al.  Ontology enhancement and concept granularity learning: keeping yourself current and adaptive , 2011, KDD.

[31]  Philipp Cimiano,et al.  Ontology Learning from Text: Methods, Evaluation and Applications , 2005 .

[32]  Marius Pasca,et al.  Turning Web Text and Search Queries into Factual Knowledge: Hierarchical Class Attribute Extraction , 2008, AAAI.

[33]  Paola Velardi,et al.  Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites , 2004, CL.

[34]  Guilin Qi,et al.  Zhishi.me - Weaving Chinese Linking Open Data , 2011, SEMWEB.