Pre-Printed and Hand-Filled Table-Form Analysis Aiming Cell Extraction

This paper presents an approach to extract the structure of pre-printed and hand-filled table-forms. The first module performs the cell identification based on Watershed transform. A second module detects the wrong cells produced by handwritten and/or pre-printed data. In this module, wrong cells and other cells are filtered by a compactness, perimeter and area analysis. In a third module, the wrong cells are merged with other cells to determine the exact structure. A miscellaneous database composed of 300 pre-printed and hand-filled table-form images was used to evaluate the efficiency of our methodology. Experiments showed significant and promising results.

[1]  Jean Serra,et al.  Image Analysis and Mathematical Morphology , 1983 .

[2]  Bertrand Coüasnon DMOS: a generic document recognition method, application to an automatic generator of musical scores, mathematical formulae and table structures recognition systems , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[3]  Osamu Hori,et al.  Robust table-form structure analysis based on box-driven reasoning , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[4]  Toyohide Watanabe,et al.  Structure recognition methods for various types of documents , 2005, Machine Vision and Applications.

[5]  Vishal Misra,et al.  Interpreting and representing tabular documents , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[7]  Daniel P. Lopresti,et al.  Evaluating the performance of table processing algorithms , 2002, International Journal on Document Analysis and Recognition.

[8]  Thomas Kieninger,et al.  The T-Recs Table Recognition and Analysis System , 1998, Document Analysis Systems.