Identification of scripts and orientations of degraded document images

Document scripts and document orientations are important information for the document digitalization. Prior work has been reported to identify document scripts and document orientations, whereas most reported methods are very sensitive to document skew and low image resolution. This paper reports a document script and document orientation identification method that addresses this issue by converting a document image into a pair of document vectors using the density and distribution of character strokes. Experiments over 3,024 document images of 12 scripts show that the proposed methods are accurate and tolerant to various types of document degradation.

[1]  David S. Doermann,et al.  Machine printed text and handwriting identification in noisy document images , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Shijian Lu,et al.  Script and Language Identification in Noisy and Degraded Document Images , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Rafael Dueire Lins,et al.  A fast orientation and skew detection algorithm for monochromatic document images , 2005, DocEng '05.

[4]  Robert S. Caprari Algorithm for text page up/down orientation determination , 2000, Pattern Recognit. Lett..

[5]  Sridha Sridharan,et al.  Texture for script identification , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Chew Lim Tan,et al.  Automatic document orientation detection and categorization through document vectorization , 2006, MM '06.

[7]  Patrick Kelly,et al.  Automatic script identification from images using cluster-based templates , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[8]  Norihiro Hagita,et al.  Automated entry system for printed documents , 1990, Pattern Recognit..

[9]  Yang Cao,et al.  Skew detection and correction in document images bsed on straight-line fitting , 2003, Pattern Recognit. Lett..

[10]  Jie Ding,et al.  Classification of oriental and European scripts by using characteristic features , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[11]  Anil K. Jain,et al.  Page segmentation using tecture analysis , 1996, Pattern Recognit..

[12]  Patrick Kelly,et al.  Automatic Script Identification From Document Images Using Cluster-Based Templates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  A. Lawrence Spitz,et al.  Determination of the Script and Language Content of Document Images , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Tieniu Tan,et al.  Rotation Invariant Texture Features and Their Use in Automatic Script Identification , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Dan S. Bloomberg,et al.  Measuring document image skew and orientation , 1995, Electronic Imaging.

[16]  Hrishikesh B. Aradhye A generic method for determining up/down orientation of text in roman and non-roman scripts , 2005, Pattern Recognit..