论文信息 - How to find mathematics on a scanned page

How to find mathematics on a scanned page

We describe the design of document analysis procedures to separate mathematics from ordinary text on a scanned page of mixed material. It is easy to observe that the accuracy of commercial OCR programs is helped by separating mixed material into two (or more) streams, with conventional non-math text handled by the usual OCR text-based-heuristics analysis. The second stream, consisting of material judged to be mathematics, can be fed to a specialized recognizer. If that fails to decode it, it can be passed on to yet a third stream including diagrams, logos, or other miscellaneous material, perhaps including halftones. We explore the extent to which this separation can be automated in the context of scanning archival material for a digital library project including mathematical and scientific journal pages.

Richard J. Fateman

[1] Eric Lecolinet,et al. A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[2] Mahesh Viswanathan,et al. Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[3] Dorothea Blostein,et al. RECOGNITION OF MATHEMATICAL NOTATION , 1997 .

[4] Lawrence O'Gorman,et al. Document Image Analysis , 1996 .

[5] Richard J. Fateman,et al. A Suite of Programs for Document Structuring and Image Analysis using Lisp , 1998 .

[6] Friedrich M. Wahl,et al. Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..

[7] H. Varian. A Model of Sales , 1980 .

[8] Hsi-Jian Lee,et al. Design of a mathematical expression understanding system , 1997, Pattern Recognit. Lett..

[9] Masayuki Okamoto,et al. An Experimental Implementation of a Document Recognition System for Papers Containing Mathematical Expressions , 1992 .

[10] A FletcherLloyd,et al. A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988 .

[11] Rangachar Kasturi,et al. A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..