Document understanding for a broad class of documents

We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents. All information sources, from geometric features and spatial relations to the textual features and content are employed in the analysis. To deal effectively with these information sources, we define a document representation general and flexible enough to represent complex documents. To handle such a broad document class, it uses generic document knowledge only, which is identified explicitly. The proposed system inte- grates components based on computer vision, artificial intelligence, and natural language processing techniques. The system is fully implemented and experimental re- sults on heterogeneous collections of documents for each component and for the entire system are presented.

[1]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[2]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[3]  Xuhong Li,et al.  A document classification and extraction system with learning ability , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[4]  Marco Aiello,et al.  Spatial reasoning : theory and practice , 2002 .

[5]  Harry Wechsler,et al.  Classification of binary document images into textual or nontextual data blocks using network models , 1995 .

[6]  Francesca Cesarini,et al.  INFORMys: A Flexible Invoice-Like Form-Reader System , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[8]  Marcel Worring,et al.  Interactive Retrieval of Color Images , 2001, Int. J. Image Graph..

[9]  Sargur N. Srihari,et al.  Classification of newspaper image blocks using texture analysis , 1989, Comput. Vis. Graph. Image Process..

[10]  Sargur N. Srihari,et al.  Using domain knowledge to derive the logical structure of documents , 1996, Electronic Imaging.

[11]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[12]  Marcel Worring,et al.  The UvA color document dataset , 2004, International Journal of Document Analysis and Recognition (IJDAR).

[13]  George Nagy,et al.  HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS , 1984 .

[14]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Yalin Wang,et al.  Zone content classification and its performance evaluation , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[16]  Philippe Balbiani Jean-Fran,et al.  A Model for Reasoning about Bidimensional Temporal Relations , 1998 .

[17]  A. Dengel,et al.  Logical labeling of document images based on form layout features , 1997, Proceedings Workshop on Document Image Analysis (DIA'97).

[18]  Hanno Walischewski,et al.  Automatic knowledge acquisition for spatial document interpretation , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[19]  Arnold W. M. Smeulders,et al.  Strings: Variational Deformable Models of Multivariate Ordered Features , 2001 .

[20]  Donato Malerba,et al.  Transforming paper documents into XML format with WISDOM++ , 2001, International Journal on Document Analysis and Recognition.

[21]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[22]  Thomas Kieninger,et al.  Document Structure Analysis Based on Layout and Textual Features , 2000 .

[23]  Marco Aiello,et al.  Combining linguistic and spatial information for document analysis , 2000, RIAO.

[24]  Francesca Cesarini,et al.  A two level knowledge approach for understanding documents of a multi-class domain , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[25]  Joost van de Weijer,et al.  Fast Anisotropic Gauss Filtering , 2002, ECCV.

[26]  Sung-Bae Cho,et al.  Geometric Structure Analysis of Document Images: A Knowledge-Based Approach , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Marcel Worring,et al.  Searching for images in biomedical publications , 2001 .

[28]  David S. Doermann,et al.  Classification of document page images based on visual similarity of layout structures , 1999, Electronic Imaging.

[29]  Marcel Worring,et al.  First order Gaussian graphs for efficient structure classification , 2003, Pattern Recognit..

[30]  Marco Aiello Document Image Analysis via Model Checking , 2001 .

[31]  Marcel Worring,et al.  Logical structure detection for heterogeneous document classes , 2000, IS&T/SPIE Electronic Imaging.

[32]  Amitabha Mukerjee,et al.  A Qualitative Model for Space , 1990, AAAI.

[33]  Yannis A. Dimitriadis,et al.  Structured document labeling and rule extraction using a new recurrent fuzzy-neural system , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[34]  Haruo Asada,et al.  Major components of a complete text reading system , 1992 .

[35]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[36]  S. Ghebreab,et al.  Medical Image Segmentation by Strings , 2001 .

[37]  Ron Kohavi,et al.  Data Mining Using MLC a Machine Learning Library in C++ , 1996, Int. J. Artif. Intell. Tools.

[38]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Carla E. Brodley,et al.  Pruning Decision Trees with Misclassification Costs , 1998, ECML.

[40]  Marcel Worring,et al.  Searching in document images: what does the appearance of a document tell us about what it means? , 2001 .

[41]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[42]  Mahesh Viswanathan,et al.  Two complementary techniques for digitized document analysis , 2000, DOCPROCS '88.