Conversion of scanned documents to the open document architecture

The paper presents a system for the conversion of scanned documents into the open document architecture. Unlike previous work in this field the authors use a combination of evidence sources to achieve greater robustness to document defects and noise introduced in the scanning process. Furthermore, they use optical character recognition in conjunction with other forms of image analysis as a means of detecting document structure. This enables enhanced document feature extraction and improved performance. They demonstrate the performance of the system on a specific class of input document.<<ETX>>