A Suffix Tree Based Handwritten Chinese Address Recognition System

The main contribution of the paper is that it presents a suffix tree based data structure for automatic handwritten Chinese address reading. Since lots of papers have discussed the destination address block (DAB) location for Chinese, we will not extend it in this paper. Instead, we pay more attention to improve the address matching performance after DAB location. As some conventional methods, the extracted text lines are pre-segmented into a series of radicals. We then build a hierarchical structure of sub-strings from the recognized characters of valid radical combinations. Coarse address candidates are selected at the same time. In address maching, we incorporate postcode information to filter redundant addresses. The pre- segmented radicals are compared with candidate address and a cost function combining recognition and structrual cost is evaluated for final decision. In the system, character segmentation, recognition, string searching and matching are considered synchronously by taking advantage of lexicon knowledge. Suffix tree can greatly facilitate the substring generation process and enable the matching process to start from any character to collect potentially bitty information. Therefore, our algorithms is more robust to the intervening noises and irregular writing styles. Finallly, we test 1,000 handwritten Chinese envelopes and achieve a correct rate of 85.30% in 3.0 seconds per mail averagely.