Machine recognition and correction of printed Arabic text

A method for automatic recognition of a multifont Arabic text entered from a scanner of 300 dpi density is presented. The system is based on two components, one for character recognition and one for word recognition. Character recognition is further divided into three phases: the digitization process, segmentation of words into characters, and identification of characters. The word recognition component is based on the Viterbi algorithm and can handle some identification errors. Character recognition was achieved despite several impeding properties of the Arabic script, especially the connectivity of characters. The processing speed is close to three characters per second with a 90% recognition rate. All algorithms were written in Pascal and run on an IBM PC/AT. >