Identi cation of the script in an image of a document page is of primary importance for a system processing multi-lingual documents. In this paper two trainable classi cation schemes have been proposed for identi cation of Indian scripts. Both the schemes use connected components extracted from the textual region. The rst classi er uses the novel Gabor lterbased feature extraction scheme for the connected components. We have also found that the pixel distribution of connected components can be used to capture the shapes of connected components and thus form the basis for script recognition. It has been experimentally found that the features extracted by Gabor lter-based scheme provides the most reliable performance. The other technique is simple, computationally more eÆcient and gives reasonably good performance. The decisions of the two classi ers designed in this paper are combined using Logistic Regression method. The combination has shown an improved recognition performance as compared to that by individual classi ers.
[1]
Patrick Kelly,et al.
Automatic Script Identification From Document Images Using Cluster-Based Templates
,
1997,
IEEE Trans. Pattern Anal. Mach. Intell..
[2]
Mehdi Mostaghimi,et al.
Bayesian estimation of a decision using information theory
,
1997,
IEEE Trans. Syst. Man Cybern. Part A.
[3]
Tieniu Tan,et al.
Rotation Invariant Texture Features and Their Use in Automatic Script Identification
,
1998,
IEEE Trans. Pattern Anal. Mach. Intell..
[4]
Jie Ding,et al.
Classification of oriental and European scripts by using characteristic features
,
1997,
Proceedings of the Fourth International Conference on Document Analysis and Recognition.