Textual image compression

The authors describe a method for lossless compression of images that contain predominantly typed or typeset text-they call these textual images. An increasingly popular application is document archiving, where documents are scanned by a computer and stored electronically for later retrieval. Their project was motivated by such an application: Trinity College in Dublin, Ireland, are archiving their 1872 printed library catalogues onto disk, and in order to preserve the exact form of the original document, pages are being stored as scanned images rather than being converted to text. The test images are taken from this catalogue. These typeset documents have a rather old-fashioned look, and contain a wide variety of symbols from several different typefaces-the five test images used contain text in English, Flemish, Latin and Greek, and include italics and small capitals as well as roman letters. The catalogue also contains Hebrew, Syriac, and Russian text.<<ETX>>