Learning to Recognize Hand-Printed Chinese Charaters Using Inductive Logic Programming

Recognition of Chinese characters has been a major interest of researchers for many years, and a large number of research papers and reports have already been published in this area. There are several major problems: Chinese characters are distinct and ideographic, the character size is very large and a lot of structurally similar characters exist in the character set. Thus, classification criteria are difficult to find. This paper presents a new technique for the recognition of hand-printed Chinese characters using machine learning. Conventional methods have relied on hand-constructed dictionaries which are tedious to construct and difficult to make tolerant to variations in writing styles. The advantages of machine learning are twofold: it can generalize over the large degree of variations between writing styles and recognition rules can be constructed by example. The paper also describes three methods of feature extraction for Chinese character recognition: regular expression, dominant point and modified Hough transform. These methods are then compared in terms of accuracy and efficiency.