论文信息 - Multi-oriented English Text Line Identification

Multi-oriented English Text Line Identification

There are many artistic documents where text lines of a single page may have different inclinations (orientations). To enhance the ability of document analysis system, we have to extract text line in multiple orientations. In this paper, we propose a robust technique to detect English text lines of arbitrary orientation in a single document page. We propose here a bottom-up approach where the connected components are at first labelled. They are then clustered into word groups. Text lines of arbitrary orientation are identified from the estimation of these word groups. From an experiment of 3700 text lines, we obtained an accuracy of 98.3% by the proposed method.

Bidyut Baran Chaudhuri | Umapada Pal | Suranjit Sinha

[1] Lawrence O'Gorman,et al. The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[2] Frank Hönes,et al. Layout extraction of mixed mode documents , 2005, Machine Vision and Applications.

[3] Hirotomo Aso,et al. Extracting curved text lines using local linearity of the text line , 1999, International Journal on Document Analysis and Recognition.

[4] Mahesh Viswanathan,et al. A prototype document image analysis system for technical journals , 1992, Computer.

[5] Bidyut Baran Chaudhuri,et al. Multi-skew detection of Indian script documents , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[6] Rangachar Kasturi,et al. A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..