Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis

One of the important reasons for poor recognition rate in optical character recognition (OCR) system is the error in character segmentation. Existence of touching characters in the scanned documents is a major problem to design an effective character segmentation procedure. In this paper, a new technique is presented for identification and segmentation of touching characters. The technique is based on fuzzy multifactorial analysis. A predictive algorithm is developed for effectively selecting possible cut columns for segmenting the touching characters. The proposed method has been applied to printed documents in Devnagari and Bangla: the two most popular scripts of the Indian sub-continent. The results obtained from a test-set of considerable size show that a reasonable improvement in recognition rate can be achieved with a modest increase in computations.

[1]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[2]  H. Li Multifactorial functions in fuzzy sets theory , 1990 .

[3]  Yasuaki Nakano,et al.  Segmentation methods for character recognition: from segmentation to document structure analysis , 1992, Proc. IEEE.

[4]  Theodosios Pavlidis,et al.  A Shape Analysis Model with Applications to a Character Recognition System , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Yi Lu On the segmentation of touching characters , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[7]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[8]  Azriel Rosenfeld,et al.  A method of detecting the orientation of aligned components , 1986, Pattern Recognit. Lett..

[9]  Hong Yan,et al.  Skew Correction of Document Images Using Interline Cross-Correlation , 1993, CVGIP Graph. Model. Image Process..

[10]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Mindy Bokser,et al.  Omnidocument technologies , 1992, Proc. IEEE.

[12]  Bidyut Baran Chaudhuri,et al.  Skew Angle Detection of Digitized Indian Script Documents , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Bidyut Baran Chaudhuri,et al.  An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi) , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[15]  Dave Elliman,et al.  A review of segmentation and contextual analysis techniques for text recognition , 1990, Pattern Recognit..

[16]  Hongxing Li,et al.  Fuzzy Sets and Fuzzy Decision-Making , 1995 .

[17]  V. A. Kovalevsky,et al.  Character readers and pattern recognition , 1968 .

[18]  Pei-Zhuang Wang,et al.  A factor spaces approach to knowledge representation , 1990 .

[19]  Ulrich Kressel,et al.  Cut classification for segmentation , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[20]  Haruo Asada,et al.  Major components of a complete text reading system , 1992 .

[21]  Majid Ahmadi,et al.  Segmentation of touching characters in printed document recognition , 1994, Pattern Recognit..