Segmenting bangla text for optical recognition

One of the important reasons for poor recognition rate in optical character recognition (OCR) system is the error in character segmentation. Existence of different type of characters in the scanned documents is a major problem to design an effective character segmentation procedure. In this paper, a new technique is presented for identification and segmentation of Bengali printed characters. This paper focuses on the segmentation of printed Bengali characters for efficient recognition of the characters. Our Line segmentation success rate is 99.7 % for 1000 lines, we have tested. Our Word segmentation success rate is 99.8 % for 4900 words tested. From the experiment we noticed that isolated characters fall into isolated group in 99.50 % cases. Most of the errors come from connected characters and characters having tau in front of them as segmenting tau we take the help of width. From the experiment we noticed that most of the errors came from components having multi-touching points between two characters.

[1]  U. Pal,et al.  Segmentation of Bangla unconstrained handwritten text , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[2]  Roy L. Hoffman,et al.  Segmentation Methods for Recognition of Machine-Printed Characters , 1971, IBM J. Res. Dev..

[3]  Robert J. Schalkoff,et al.  Pattern recognition - statistical, structural and neural approaches , 1991 .

[4]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[5]  Md. Al Mehedi Hasan,et al.  A New Approach to Bangla Text Extraction and Recognition From Textual Image , 2005 .

[6]  Paramvir Bahl,et al.  Recognition of handwritten word: First and second order hidden Markov model based approach , 1989, Pattern Recognit..

[7]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Yi Lu,et al.  Character segmentation in handwritten words - An overview , 1996, Pattern Recognit..

[9]  Sargur N. Srihari,et al.  Off-Line Cursive Script Word Recognition , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  V. A. Kovalevsky,et al.  Character readers and pattern recognition , 1968 .

[11]  Fumitaka Kimura,et al.  Improvements of a lexicon directed algorithm for recognition of unconstrained handwritten words , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[12]  Bidyut Baran Chaudhuri,et al.  Segmentation of Bangla handwritten text into characters by recursive contour following , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[13]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[14]  Berrin A. Yanikoglu,et al.  Segmentation of off-line cursive handwriting using linear programming , 1998, Pattern Recognit..

[15]  Bidyut Baran Chaudhuri,et al.  Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[16]  John E. Howland,et al.  Computer graphics , 1990, IEEE Potentials.