smartFIX: An Adaptive System for Document Analysis and Understanding

The internet is certainly a wide-spread platform for information interchange today and the semantic web actually seems to become more and more real. However, day-to-day work in companies still necessitates the laborious, manual processing of huge amounts of printed documents. This article presents the system smartFIX, a document analysis and understanding system developed by the DFKI spin-off insiders. During the research project “adaptive Read”, funded by the German ministry for research, BMBF, smartFIX was fundamentally developed to a higher maturity level, with a focus on adaptivity. The system is able to extract information from documents – documents ranging from fixed format forms to unstructured letters of many formats. Apart from the architecture, the main components and the system characteristics, we also show some results from the application of smartFIX to representative samples of medical bills and prescriptions.

[1]  Sameer Singh,et al.  Advances in Pattern Recognition — ICAPR 2001 , 2001, Lecture Notes in Computer Science.

[2]  Osamu Yoshie,et al.  Web Knowledge Management and Decision Support , 2003, Lecture Notes in Computer Science.

[3]  Markus Junker,et al.  Preventing Overfitting in Learning Text Patterns for Document Categorization , 2001, ICAPR.

[4]  Andreas Dengel,et al.  Message extraction from printed documents-a complete solution , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[5]  Andreas Dengel,et al.  Computer understanding of document structure , 1996 .

[6]  Thomas Kieninger,et al.  Three approaches to "industrial" table spotting , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[7]  Jonathan J. Hull,et al.  Document Analysis Systems II - Second Workshop on Document Analysis Systems, DAS 1996, Malvern, PA, USA, October 14-16, 1996, Selected papers , 1998, Series in Machine Perception and Artificial Intelligence.

[8]  Guus Schreiber,et al.  Knowledge Engineering and Management: The CommonKADS Methodology , 1999 .

[9]  Andreas Fordan Constraint Solving over OCR Graphs , 2001, INAP.

[10]  Yasuaki Nakano,et al.  Document Analysis Systems: Theory and Practice , 2003, Lecture Notes in Computer Science.

[11]  Andreas Dengel,et al.  The specialist board a technology workbench for document analysis and understanding , 1996 .