Sidestepping Optical Character Recognizers: Speculation on a Graphical Method to Produce Large Print

Scientific literature is scarce in large-print format, and this scarcity limits access to the scientific professions for persons with low vision. Traditional devices based on lenses are awkward to use with this literature because they obtain enlargement at the price of a reduced visual field. Other possibilities for access, such as closed circuit TVs and copy machines that enlarge have the same flaw. Computers seem to provide a solution, since large-print text on computers can be easily reformatted to fit the size of the output medium. The only problem is that computer systems that enlarge print require input in a special coded format (IBM National Support Center, 1990). There is very little literature available in these formats, and no format standards exist. An optical character recognizer (OCR) overcomes this limitation for many types of text. OCRs are programs that translate the pictorial image of a document into a coded format. The output of an OCR can be used easily by existing products to produce large print, braille, or speech. An OCR is a sophisticated program in that its main job is to comprehend the geometric images of characters on a page. This report explores an alternative to OCR programs called direct graphical transcription (DGT). Unlike OCR algorithms, this method does not comprehend its data; it merely restructures images of documents in standard print into images of documents in large print. Being computer based, a DGT system can organize its output to fit on any output medium and thus achieve enlargement without truncation. Because a DGT system does not need to comprehend the images it modifies, it may be particularly effective for converting scientific information to a large-print format. The method does have limits; it cannot replace the role of OCRs in translation to braille or speech or provide the extreme enlargement produced by closed circuit TVs. Limitations of OCR Given the highest quality print, the best OCRs will recognize characters with 98 percent accuracy (Schreier & Uslan, 1991). This is extremely accurate, but it may not be precise enough for a professionalespecially a mathematician. If the text is complex, such as an advertisement, character recognition can be dramatically reduced (Schreier & Uslan, 1991). Differences in orientationandscaling also inhibit recognition (Long, et al., 1984). Wide ranges of scaling are typical in scientific publications. The alphabets accepted by OCRsare limited (Long,et al., 1984), not all character fonts can be recognized, and no OCRrecognizes the wide range ofcharacters needed to read mathematics. In an attempt to extend its range, an OCR may generalize. Asa result, an OCRmay encode several different attributes ofthe same letter into a single character code. For example, the italic and boldface attributes of a lower case e may both be mapped to the single ASCIIcode for a lower case e. This generalization could seriously contaminate transcription of scientific expositions in which character attributes are used to distinguish between different variables. Until these problems are addressed, OCR technology cannotbe considered seriously as a means of access to scientific literature for persons with low vision. The two-dimensional nature of mathematical notation poses a more serious problem for OCR technology. Images in a mathematical exposition can be thought ofas being composed ofnested box structures that have been glued together (Knuth, 1984). For example, the relatively simple expression, y + ";Xl' can be viewed as three boxes glued together: the variable y, the operator +, and the square root. The square root expression is a nested box containing the box, XI' OCRs are oriented toward detecting character sequences that are one-dimensional structures, so two-dimensional box constructs of mathematics have little chance of being translated correctly by existing OCRs. This author can find no sources which address the problem of recognizing the box structures of mathematical notation.