Computer recognition of printed Tamil characters

Abstract Computer recognition of machine-printed letters of the Tamil alphabet is described. Each character is represented as a binary matrix and encoded into a string using two different methods. The encoded strings form a dictionary. A given text is presented symbol by symbol and information from each symbol is extracted in the form of a string and compared with the strings in the dictionary. When there is agreement the letters are recognized and printed out in Roman letters following a special method of transliteration. The lengthening of vowels and hardening of consonants are indicated by numerals printed above each letter.