论文信息 - Efficient conversion of digital documents to multilayer raster formats

Efficient conversion of digital documents to multilayer raster formats

How can we turn the description of a digital (i.e. electronically produced) document into something that is efficient for multi-layer raster formats? It is first shown that a foreground/background segmentation without overlapping foreground components can be more efficient for viewing or printing. Then, a new algorithm that prevents overlaps between foreground components while optimizing both the document quality and compression ratio is derived from the minimum description length (MDL) criterion. This algorithm makes the DjVu compression format significantly, more efficient on electronically produced documents. Comparisons with other formats are provided.

[1] Wayne Niblack,et al. Unsupervised image segmentation using the minimum description length principle , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[2] Yoshua Bengio,et al. High quality document image compression with "DjVu" , 1998, J. Electronic Imaging.

[3] Steven Pigeon,et al. Lossy compression of partially masked still images , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[4] Yann LeCun,et al. DjVu: analyzing and compressing scanned documents for Internet distribution , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[5] Daniel P. Huttenlocher,et al. Digipaper: a versatile color document image representation , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).