Logical structure detection for heterogeneous document classes

We present a fully implemented system based on generic document knowledge for detecting the logical structure of documents for which only general layout information is assumed. In particular, we focus on detecting the reading order. Our system integrates components based on computer vision, artificial intelligence, and natural language processing techniques. The prominent feature of our framework is its ability to handle documents from heterogeneous collections. The system has been evaluated on a standard collection of documents to measure the quality of the reading order detection. Experimental results for each component and the system as a whole are presented and discussed in detail. The performance of the system is promising, especially when considering the diversity of the document collection.

[1]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[3]  Yannis A. Dimitriadis,et al.  Structured document labeling and rule extraction using a new recurrent fuzzy-neural system , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[4]  Bidyut Baran Chaudhuri,et al.  Extraction of type style-based meta-information from imaged documents , 2001, International Journal on Document Analysis and Recognition.

[5]  Xuhong Li,et al.  A document classification and extraction system with learning ability , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[6]  Philippe Balbiani Jean-Fran,et al.  A Model for Reasoning about Bidimensional Temporal Relations , 1998 .

[7]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Marco Aiello,et al.  Combining linguistic and spatial information for document analysis , 2000, RIAO.

[9]  Francesca Cesarini,et al.  A two level knowledge approach for understanding documents of a multi-class domain , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[10]  Haruo Asada,et al.  Major components of a complete text reading system , 1992 .

[11]  Yasuto Ishitani Logical structure analysis of document images based on emergent computation , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[12]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[13]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[14]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[15]  George Nagy,et al.  HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS , 1984 .

[16]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.