Abstract We develop Arabic Optical Character Recognition (AOCR) system that has five stages: preprocessing, segmentation, thinning, feature extraction, and classification. In preprocessing stage, we compare two skew estimation algorithms i.e. skew estimation by image moment and by skew triangle. We also implemented binarization and median filter. In thinning stage, we use Hilditch thinning algorithm incorporated by two templates, one to prevent superfluous tail and the other one to remove unnecessary interest point. In segmentation stage, line segmentation is done by horizontal projection cross verification by standard deviation, sub-word segmentation is done by connected pixel components, and letter segmentation is done by Zidouri algorithm. In the feature extraction stage, 24 features are extracted. The features can be grouped into three groups: main body features, perimeter- skeleton features, and secondary object features. In the classification stage, we use decision tree that generated by C4.5 algorithm. Functionality test showed that skew estimation using moment is more accurate than using skew triangle, median filter tends to erode the letter shape, and template addition into Hilditch algorithm gives a good result. Performance test yield these result. Line segmentation had 99.9% accuracy. Standard deviation is shown can reduce over-segmentation and quasi-line. Letter segmentation had 74% accuracy, tested on six different fonts. Classification components had 82% accuracy, tested by cross validation. Unfortunately, overall performance of the system only reached 48.3%.
[1]
Kamran Zamanifar,et al.
Multi-Font Farsi/Arabic Isolated Character Recognition Using Chain Codes
,
2008
.
[2]
Adnan Amin,et al.
Recognition of printed arabic text based on global features and decision tree learning techniques
,
2000,
Pattern Recognit..
[3]
Muhammad Sarfraz,et al.
Saudi Arabian license plate recognition system
,
2003,
2003 International Conference on Geometric Modeling and Graphics, 2003. Proceedings.
[4]
Gheith A. Abandah,et al.
Analysis of Handwritten Arabic Letters Using Selected Feature Extraction Techniques
,
2009,
Int. J. Comput. Process. Orient. Lang..
[5]
Abdelmalek Zidouri.
On Multiple Typeface Arabic Script Recognition
,
2010
.
[6]
John Cowell,et al.
Extracting features from Arabic characters
,
2001
.
[7]
Ahmad T. Al-Taani,et al.
Recognition of on-Line Arabic Handwritten Characters Using Structural Features
,
2010
.
[8]
C. J. Hilditch,et al.
Linear Skeletons From Square Cupboards
,
1969
.