Currently, in the field of information processing, characters are defined and shared using coded character sets. Character processing based on coded character sets, however, has two problems: (1) Coded character sets may lack some necessary characters. (2) Characters in coded character sets have fixed semantics. They may prevent to implement classical text database for philological studies. Especially for Kanji (Chinese character), they are serious problems to digitize classical texts. To resolve the problems, we proposed "Chaon" model which is a new model of character processing based on character ontology. To realize them, a character ontology is required. Especially for Kanji, large scale ontology is required. So we realized a large scale character ontology which includes 98 thousand characters including Unicode and non-Unicode characters. This paper focuses our design or principal of a large scale character ontology based on Chaon model, and overview of its implementation named CHISE (Character Information Service Environment).
[1]
Chu-Ren Huang,et al.
Hantology-A Linguistic Resource for Chinese Language Processing and Studying
,
2006,
LREC.
[2]
Yasuhiro Suzuki,et al.
Network of words
,
2006,
Artificial Life and Robotics.
[3]
Roger Frost,et al.
International Organization for Standardization (ISO)
,
2004
.
[4]
Kam-Fai Wong,et al.
Natural Language Processing - IJCNLP 2005, Second International Joint Conference, Jeju Island, Korea, October 11-13, 2005, Proceedings
,
2005,
IJCNLP.
[5]
Chu-Ren Huang,et al.
Hantology: An Ontology based on Conventionalized Conceptualization
,
2005,
OntoLex@IJCNLP.