论文信息 - Optimal Feature Extraction for Bilingual OCR

Optimal Feature Extraction for Bilingual OCR

Feature extraction in bilingual OCR is handicapped by the increase in the number of classes or characters to be handled. This is evident in the case of Indian languages whose alphabet set is large. It is expected that the complexityof the feature extraction process increases with the number of classes. Though the determination of the best set of features that could be used cannot be ascertained through any quantitative measures, the characteristics of the scripts can help decide on the feature extraction procedure. This paper describes a hierarchical feature extraction scheme for recognition of printed bilingual (Tamil and Roman) text. The scheme divides the combined alphabet set of both the scripts into subsets by the extraction of certain spatial and structural features. Three features viz geometric moments, DCT based features and Wavelet transform based features are extracted from the grouped symbols and a linear transformation is performed on them for the purpose of efficient representation in the feature space. The transformation is obtained bythe maximization of certain criterion functions. Three techniques : Principal component analysis, maximization of Fisher's ratio and maximization of divergence measure have been employed to estimate the transformation matrix. It has been observed that the proposed hierarchical scheme allows for easier handling of the alphabets and there is an appreciable rise in the recognition accuracyas a result of the transformations.

A. G. Ramakrishnan | D. Dhanya

[1] Yoshihiko Hamamoto,et al. A Bootstrap Technique for Nearest Neighbor Classifier Design , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2] Anshu Sinha,et al. An improved recognition module for the identification of handwritten digits , 1999 .

[3] Andreas Spanias,et al. Improved speech recognition using a subspace projection approach , 1999, IEEE Trans. Speech Audio Process..

[4] A. G. Ramakrishnan,et al. Script identification in printed bilingual documents , 2002, Document Analysis Systems.

[5] Roland T. Chin,et al. On Image Analysis by the Methods of Moments , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[6] Mandyam D. Srinath,et al. Orthogonal Moment Features for Use With Parametric and Non-Parametric Classifiers , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[7] Alireza Khotanzad,et al. Rotation invariant image recognition using features selected via a systematic method , 1990, Pattern Recognit..