论文信息 - Eigencharacter: An Embedding of Chinese Character Orthography

Eigencharacter: An Embedding of Chinese Character Orthography

Chinese characters are unique in its logographic nature, which inherently encodes world knowledge through thousands of years evolution. This paper proposes an embedding approach, namely eigencharacter (EC) space, which helps NLP application easily access the knowledge encoded in Chinese orthography. These EC representations are automatically extracted, encode both structural and radical information, and easily integrate with other computational models. We built EC representations of 5,000 Chinese characters, investigated orthography knowledge encoded in ECs, and demonstrated how these ECs identified visually similar characters with both structural and radical information.

Yu-Hsiang Tseng | Shu-Kai Hsieh | S. Hsieh | Yu-Hsiang Tseng

[1] G. Legge,et al. Comparing the minimum spatial-frequency content for recognizing Chinese and alphabet characters , 2018, Journal of vision.

[2] M. Farah,et al. The inverted face inversion effect in prosopagnosia: Evidence for mandatory, face-specific perceptual mechanisms , 1995, Vision Research.

[3] B. Zhang,et al. Localization and Functional Characterization of an Occipital Visual Word form Sensitive Area , 2018, Scientific Reports.

[4] Chih-Wei Hue. NUMBER OF CHARACTERS A COLLEGE STUDENT KNOWS , 2003 .

[5] L Sirovich,et al. Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[6] S. Yeh,et al. Role of structure and component in judgments of visual similarity of Chinese characters. , 2002, Journal of experimental psychology. Human perception and performance.

[8] A. O'Toole,et al. Structural aspects of face recognition and the other-race effect , 1994, Memory & cognition.

[9] Ying Liu,et al. The lexical constituency model: some implications of research on Chinese for general theories of reading. , 2005, Psychological review.