CCE: A Chinese Concept Encyclopedia Incorporating the Expert-Edited Chinese Concept Dictionary with Online Cyclopedias

Bag-of-words is the most common-used method in text mining tasks and many other applications. However, this method has some obvious shortcomings, such as ignoring semantic information. While in document analysis, semantic information always plays a more important role than individual words. To tackle this problem, we need to borrow semantic information from ontologies to learn the text information better. An expert-edited ontology is usually well structured and is more authoritative than an online cyclopedia. On the other hand, due to the costly editing, it is rather difficult for expert-edited ontologies to keep up with a deluge of new words. In this paper, we propose a method to construct a Chinese ontology to keep the carefully-designed structure of an expert-edited ontology, meanwhile embody new vocabulary from an online cyclopedia. We name the enhanced ontology as Chinese Concept Encyclopedia (CCE) and employ it in some text mining applications. The experimental results show that CCE outperforms the expert-edited ontology Chinese Concept Dictionary (CCD).