Multilingual character segmentation using matching rate

Publisher Summary Character segmentation which has an effect on the performance of optical character recognition (OCR) system is very difficult especially when one character splits into two or three components and several characters touch each other. Some methods are proposed to solve these problems. But, heuristically driven pitch information of characters is not competent enough for solving splitting and touching problems occurred in documents with various sizes and styles of characters. Although multistage graph search algorithm using dynamic programming can improve the segmentation results, but this method needs combinatorically increasing computing time. This chapter describes a character segmentation method using the matching rate between an input character and two finally selected candidate characters in the documents which consist of alphanumeric, symbols, Korean, and Chinese characters. The method can determine the exact cutting and merging point one by one character, consequently it needs small computing time. The experimental results have proven that the proposed method is efficient and accurate enough to enhance the performance of document recognition system.

[1]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[2]  Yasuaki Nakano,et al.  Segmentation methods for character recognition: from segmentation to document structure analysis , 1992, Proc. IEEE.

[3]  Ulrich Kressel,et al.  Cut classification for segmentation , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[4]  Yi Lu On the segmentation of touching characters , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[5]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  S. Ariyoshi A character segmentation method for Japanese printed documents coping with touching character problems , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[7]  Sargur N. Srihari,et al.  Classification of newspaper image blocks using texture analysis , 1989, Comput. Vis. Graph. Image Process..

[8]  Majid Ahmadi,et al.  Segmentation of touching characters in printed document recognition , 1994, Pattern Recognit..