English, Devnagari and Urdu Text Identification

In a multi-lingual multi-script country like India, a single text line of a document page may contain words of two or more scripts. For the Optical Character Recognition of such a document page it is necessary to identify different scripts from the document. In this paper, an automatic technique for word -wise identification of English, Devnagari and Urdu scripts from a single document is proposed. Here, at first, the document is segmented into lines and then the lines are segmented into possible words. Using characteristics of different scripts, the identification scheme is developed.

[1]  U. Pal,et al.  Multi-script line identification from Indian documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[2]  Bidyut Baran Chaudhuri,et al.  Script line separation from Indian multi-script documents , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[3]  A. Lawrence Spitz,et al.  Determination of the Script and Language Content of Document Images , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Tieniu Tan,et al.  Rotation Invariant Texture Features and Their Use in Automatic Script Identification , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Umapada Pal,et al.  Multioriented and curved text lines extraction from Indian documents , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  A. G. Ramakrishnan,et al.  Script identification in printed bilingual documents , 2002, Document Analysis Systems.

[7]  Jie Ding,et al.  Classification of oriental and European scripts by using characteristic features , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[8]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[9]  Sally L. Wood,et al.  Language identification for printed text independent of segmentation , 1995, Proceedings., International Conference on Image Processing.