A family of European page readers

We have demonstrated a high degree of automation in the engineering of complex machine vision systems, by building ten printed-text page readers, each specialized to a European language, at the pace of one language per week. The page readers provide these functions: page layout analysis, polyfont symbol recognition, typographical morphology, lexicon-driven contextual analysis, and Unicode output encoding. The accuracy and speed of the resulting readers are usably high, and can be easily improved if required by comparatively routine enhancements of subsystems. This exercise illustrates the advantages of a research strategy that emphasizes versatility before, but not at the expense of, accuracy and speed.

[1]  Henry S. Baird,et al.  Global-to-Local Layout Analysis , 1990 .

[2]  Rob Pike The text editor sam , 1987, Softw. Pract. Exp..

[3]  George Nagy,et al.  Performance metrics for document understanding systems , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[4]  K. S. Baird,et al.  Anatomy of a versatile page reader , 1992, Proc. IEEE.

[5]  Henry S. Baird,et al.  Language-free layout analysis , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[6]  Henry S. Baird,et al.  Feature identification for hybrid structural/statistical pattern classification , 1988, Comput. Vis. Graph. Image Process..

[7]  Henry S. Baird,et al.  Document image defect models , 1995 .

[8]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.