A table-form extraction with artefact removal

We present a novel methodology for extracting the structure of handwritten filled table-forms. The method identifies the table-form line intersections, detecting and correcting wrong intersections produced by faulty line segments or by table artefacts. Examples of artefacts are overlapping data, broken segments, and smudges. A novel method for artefact identification and deletion is also proposed. The last step performs the extraction of table-form cells. A database of 350 table-form images was used for evaluation, showing that the artefact identification method improves the performance of the table-forms structure extractor. The proposed approach reached a success rate of 85%.

[1]  Toyohide Watanabe,et al.  Layout Recognition of Multi-Kinds of Table-Form Documents , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Yasuhiro Okada,et al.  Field extraction method from existing forms transmitted by facsimile , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[3]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[4]  Vishal Misra,et al.  Interpreting and representing tabular documents , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Robert M. Haralick,et al.  Document layout structure extraction using bounding boxes of different entitles , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[6]  Toyohide Watanabe,et al.  Toward a practical document understanding of table-form documents: its framework and knowledge representation , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[7]  Bertrand Coüasnon DMOS: a generic document recognition method, application to an automatic generator of musical scores, mathematical formulae and table structures recognition systems , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[8]  A. Pizano Extracting line features from images of business forms and tables , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol. III. Conference C: Image, Speech and Signal Analysis,.

[9]  Kuo-Chin Fan,et al.  Extraction of characters from form documents by feature point clustering , 1995, Pattern Recognit. Lett..

[10]  Toyohide Watanabe,et al.  Structure recognition methods for various types of documents , 2005, Machine Vision and Applications.

[11]  Rangachar Kasturi,et al.  Efficient techniques for telephone company line drawing interpretation , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[12]  Flávio Bortolozzi,et al.  A new table interpretation methodology with little knowledge base: table interpretation methodology , 2006, SAC '06.

[13]  Daniel P. Lopresti,et al.  Evaluating the performance of table processing algorithms , 2002, International Journal on Document Analysis and Recognition.

[14]  Flávio Bortolozzi,et al.  Handwritten Artefact Identification Method for Table Interpretation with Little Use of Previous Knowledge , 2006, Document Analysis Systems.

[15]  Shigeyoshi Shimotsuji,et al.  Form identification based on cell structure , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[16]  Osamu Hori,et al.  Robust table-form structure analysis based on box-driven reasoning , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[17]  Thomas Kieninger,et al.  The T-Recs Table Recognition and Analysis System , 1998, Document Analysis Systems.

[18]  Hiroshi Sako,et al.  A recursive analysis for form cell recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.