A Mechanism for Solving the Unencoded Chinese Character Problem on the Web

The unencoded Chinese character problem that occurs when digitizing historical Chinese documents makes digital archiving difficult. Expanding the character coding space, such as by using the Unicode Standard, does not solve the problem completely due to the extensibility of Chinese characters. In this paper, we propose a mechanism based on a Chinese glyph structure database, which contains glyph expressions that represent the composition of Chinese characters. Users can search for Chinese characters through our web interface and browse the search results. Each Chinese character can be embedded in a web document using a specific Java Script code. When the web document is opened, the Java Script code will load the image of the Chinese character in an appropriate font size for display. Even if the Chinese characters are not available in the database, their images can be generated through the dynamic character composition function. As the proposed mechanism is cross-platform, users can easily access unencoded Chinese characters without installing any additional font files in their personal computers. A demonstration system is available at http://char.ndap.org.tw.

[1]  Jan-Ming Ho,et al.  Resolving the unencoded character problem for chinese digital libraries , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).