Three approaches to "industrial" table spotting

This paper introduces three approaches for an industrial, comprehensive document analysis system to enable it to spot tables in documents. Searching for a set of known table headers (approach 1) works rather well in a significant number of documents. But this approach (though it is implemented tolerant to OCR errors) is not tolerant enough towards some kinds of even minor aberrations. This not only decreases the recognition results, but also, even worse, makes users feel uncomfortable. Pragmatically trying to mimic for what the human eyes might key, leads to our two further, complementary approaches: searching for layout structures which resemble parts of columns (approach 2), and searching for groupings of similar lines (approach 3). The suitability of the approaches for our system requires them to be very simple to implement and simple to explain to users, computationally cheap, and combinable. In the domain of health insurances who receive huge amounts of so called medical liquidations on a daily basis we obtain very good results. On document samples representative for the every day practice of five customers-health insurance companies-tables were spotted as good and as fast as the customers expected the system to be. We thus consider our current approaches as a step towards cognitive adequacy.

[1]  Daniela Rus,et al.  Using White Space for Automated Document Structuring , 1994 .

[2]  Zhigang Fan,et al.  Tabular document recognition , 1994, Electronic Imaging.

[3]  Claudia Wenzel,et al.  Precise Table Recognition by Making Use of Reference Tables , 1998, Document Analysis Systems.

[4]  Robert M. Haralick,et al.  Document layout structure extraction using bounding boxes of different entitles , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[5]  H.S. Baird,et al.  A retargetable table reader , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[6]  Thomas Kieninger,et al.  The T-Recs Table Recognition and Analysis System , 1998, Document Analysis Systems.

[7]  Daniel P. Lopresti,et al.  Medium-independent table detection , 1999, Electronic Imaging.

[8]  M. Armon Rahgozar,et al.  Graph-based table recognition system , 1996, Electronic Imaging.