On Devanagari document processing

Devnagari document processing system discussed here makes use of various knowledge sources at all levels. Extraction of test zone from a document is a preprocessing stage which uses document layout knowledge represented syntactically. The test zone is then segmented into lines, lines into words and words into characters. Since Devnagari characters is a complex composition of symbols, various algorithms are used to further segment the character into its constituent symbols instead of treating the character as a unit. The symbols are then recognized using various features which are extracted and saved during training phase. The recognized symbols are composed back and sent for validation through a partitioned dictionary.

[1]  R. Mahesh K. Sinha,et al.  Rule based contextual post-processing for devanagari text recognition , 1987, Pattern Recognit..

[2]  Gilles F. Houle,et al.  Hybrid Contextural Text Recognition with String Matching , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[4]  Sargur N. Srihari,et al.  Integrating diverse knowledge sources in text recognition , 1982, TOIS.

[5]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  R. Mahesh K. Sinha,et al.  Visual text recognition through contextual processing , 1988, Pattern Recognit..

[7]  R. Mahesh K. Sinha,et al.  On partitioning a dictionary for visual text recognition , 1990, Pattern Recognit..