Automatic Table Detection in Document Images

In this paper, we propose a novel technique for automatic table detection in document images. Lines and tables are among the most frequent graphic, non-textual entities in documents and their detection is directly related to the OCR performance as well as to the document layout description. We propose a workflow for table detection that comprises three distinct steps: (i) image pre-processing; (ii) horizontal and vertical line detection and (iii) table detection. The efficiency of the proposed method is demonstrated by using a performance evaluation scheme which considers a great variety of documents such as forms, newspapers/magazines, scientific journals, tickets/bank cheques, certificates and handwritten documents.

[1]  Changsong Liu,et al.  Form frame line detection with directional single-connected chain , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[2]  Rafael Dueire Lins,et al.  A new algorithm for removing noisy borders from monochromatic documents , 2004, SAC '04.

[3]  Nikos Papamarkos,et al.  Block decomposition and segmentation for fast Hough transform evaluation , 1999, Pattern Recognit..

[4]  Jacques Facon,et al.  Methodology of automatic extraction of table-form cells , 2000, Proceedings 13th Brazilian Symposium on Computer Graphics and Image Processing (Cat. No.PR00878).

[5]  Peng-Yeng Yin Skew detection and block classification of printed documents , 2001, Image Vis. Comput..

[6]  Francesca Cesarini,et al.  Trainable table location in document images , 2002, Object recognition supported by user interaction for service robots.

[7]  Andreas Dengel,et al.  Document Analysis Systems VI , 2004, Lecture Notes in Computer Science.

[8]  Ioannis Pratikakis,et al.  An Adaptive Binarization Technique for Low Quality Historical Documents , 2004, Document Analysis Systems.

[9]  Richard Zanibbi,et al.  A survey of table recognition , 2004, Document Analysis and Recognition.

[10]  Basilios Gatos,et al.  ICDAR2005 page segmentation competition , 2007, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[11]  Thomas Kieninger,et al.  Applying the T-Recs table recognition system to the business letter domain , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[12]  Basilios Gatos,et al.  ICDAR 2003 page segmentation competition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..