Visually and Phonologically Similar Characters in Incorrect Simplified Chinese Words

Visually and phonologically similar characters are major contributing factors for errors in Chinese text. By defining appropriate similarity measures that consider extended Cangjie codes, we can identify visually similar characters within a fraction of a second. Relying on the pronunciation information noted for individual characters in Chinese lexicons, we can compute a list of characters that are phonologically similar to a given character. We collected 621 incorrect Chinese words reported on the Internet, and analyzed the causes of these errors. 83% of these errors were related to phonological similarity, and 48% of them were related to visual similarity between the involved characters. Generating the lists of phonologically and visually similar characters, our programs were able to contain more than 90% of the incorrect characters in the reported errors.

[1]  Chao-Lin Liu,et al.  Phonological and Logographic Influences on Errors in Written Chinese Words , 2009, ALR7@IJCNLP.

[2]  Chao-Lin Liu,et al.  Capturing Errors in Written Chinese Words , 2009, ACL/IJCNLP.

[3]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[4]  Chao-Lin Liu,et al.  Two Applications of Lexical Information to Computer-Assisted Item Authoring for Elementary Chinese , 2009, IEA/AIE.

[5]  D. Hung,et al.  The temporal signatures of semantic and phonological activations for Chinese sublexical processing: An event-related potential study , 2006, Brain Research.

[6]  Kuo-Chin Fan,et al.  Confusion set recognition of on-line Chinese characters by artificial intelligence technique , 1995, Pattern Recognit..

[7]  Po-Lei Lee,et al.  Orthographic and phonological processing of Chinese characters: an fMRI study , 2004, NeuroImage.

[8]  S. Yeh,et al.  Role of structure and component in judgments of visual similarity of Chinese characters. , 2002, Journal of experimental psychology. Human perception and performance.

[9]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[10]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[11]  Masaki Nakagawa,et al.  'Online recognition of Chinese characters: the state-of-the-art , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Matthew Y. Chen,et al.  Tone Sandhi: Patterns across Chinese Dialects , 2000 .

[13]  Jen-Hsiang Lin,et al.  Using Structural Information for Identifying Similar Chinese Characters , 2008, ACL.