Anatomy of a versatile page reader

An experimental printed-page reader that is easy to adapt to various languages is described. Changing the target language may involve simultaneous changes in symbol sets, typefaces, sizes of text, page layouts, linguistic contexts, and imaging defects. The strategy has been to isolate the effects of these sources of variation within separate, independent engineering subsystems. In this way, it has been possible to construct, with a minimum of manual effort, classifiers for arbitrary combinations of symbols, typefaces, sizes, and imaging defects. An attempt has been made to rid the algorithms of all language-specific rules, relying instead on automatic learning from examples and generalized table-driven methods. For some tasks it has been feasible to avoid language dependency altogether. Linguistic context can be exploited through data-directed filtering algorithms in a uniform and modular manner, so that preexisting tools developed by computational linguistics can readily be applied. These principles are illustrated by trials on English, Swedish, Tibetan, and special technical texts. >

[1]  Henry S. Baird,et al.  Global-to-Local Layout Analysis , 1990 .

[2]  Lawrence D. Jackel,et al.  Constrained neural network for unconstrained handwritten digit recognition , 1990 .

[3]  Henry S. Baird,et al.  Anatomy of a Page Reader , 1990, MVA.

[4]  T. Pavlidis Algorithms for Graphics and Image Processing , 1981, Springer Berlin Heidelberg.

[5]  Ken Thompson,et al.  Reading Chess , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Henry S. Baird,et al.  Image segmentation by shape-directed covers , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[7]  Haruo Asada,et al.  Resolving Ambiguity in Segmenting Touching Characters , 1992 .

[8]  Henry S. Baird,et al.  Feature identification for hybrid structural/statistical pattern classification , 1988, Comput. Vis. Graph. Image Process..

[9]  John Clews Language automation worldwide: the development of character set standards , 1988 .

[10]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.