Reflowable Document Images for the Web

The paper describes on-going work on a system that transforms page-oriented document images into “reflowable document images”, representations of the page image in HTML format that allows it to adapt to display devices of different sizes while preserving the original appearance of the image as much as possible and avoiding OCR errors. The approach to document layout analysis used by the system is outlined and the strengths and limitations of HTML for this application are discussed.

[1]  Apostolos Antonacopoulos,et al.  Web Document Analysis: Challenges and Opportunities , 2003 .

[2]  Kris Popat,et al.  Paper to PDA , 2002, Object recognition supported by user interaction for service robots.

[3]  Thomas M. Breuel,et al.  Two Geometric Algorithms for Layout Analysis , 2002, Document Analysis Systems.