A robust page segmentation method for Persian/Arabic documents

Optical Character Recognition (OCR) softwares are widely used in the office automation systems. One of the first steps in the recognition of the documents is to segment the input image. Various methods have been offered for the English language. For the Persian/Arabic Language, however, no complete method has been found yet. In this paper we present a new page segmentation method for Persian/Arabic printed texts. This method has been inspired by the effect of the spreading of ink on paper. One of the most important characteristics of this method is its non-sensitivity to rotation.

[1]  Linda G. Shapiro,et al.  Computer and Robot Vision , 1991 .

[2]  Apostolos Antonacopoulos,et al.  Page Segmentation Using the Description of the Background , 1998, Comput. Vis. Image Underst..

[3]  Karim Hadjar,et al.  Newspaper page decomposition using a split and merge approach , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[4]  Azriel Rosenfeld,et al.  Document structure analysis algorithms: a literature survey , 2003, IS&T/SPIE Electronic Imaging.

[5]  Song Mao,et al.  Empirical Performance Evaluation Methodology and Its Application to Page Segmentation Algorithms , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Song Mao,et al.  A Methodology for Empirical Performance Evaluation of Page Segmentation Algorithms , 1999 .

[7]  Karim Hadjar,et al.  Arabic newspaper page segmentation , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..