论文信息 - Learning to detect tables in document images using line and text information

Learning to detect tables in document images using line and text information

Table detection is a crucial step in many document analysis applications as tables are used for presenting essential information to readers in a structured manner. It is still a challenging problem due to the variety of table structures and the complexity of document layout. This paper presents a hybrid method consisting of three fundamental steps to detect table zones: classification of the regions, detection of the tables that constitute intersecting horizontal and vertical lines, and identification of the tables made up by only parallel lines. Experiments on the UW-III dataset show that the obtained results are very promising.

[1] C. Lee Giles,et al. Identifying table boundaries in digital documents via sparse line detection , 2008, CIKM '08.

[2] Clément Chatelain,et al. Learning to Detect Tables in Scanned Document Images Using Line Information , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3] Gaurav Harit,et al. Table detection in document images using header and trailer patterns , 2012, ICVGIP '12.

[4] Zhi Tang,et al. Table Header Detection and Classification , 2012, AAAI.

[5] Roshan G. Ragel,et al. Locating tables in scanned documents for reconstructing and republishing , 2014, 7th International Conference on Information and Automation for Sustainability.

[6] Ana Costa e Silva,et al. Metrics for evaluating performance in document analysis: application to tables , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[7] Kun Bai,et al. Improving the Table Boundary Detection in PDFs by Fixing the Sequence Error of the Sparse Lines , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[8] Ruiheng Qiu,et al. A Table Detection Method for Multipage PDF Documents via Visual Seperators and Tabular Structures , 2011, 2011 International Conference on Document Analysis and Recognition.

[9] Soo-Hyung Kim,et al. Hybrid page segmentation using multilevel homogeneity structure , 2015, IMCOM.

[10] Soo-Hyung Kim,et al. A hybrid method for table detection from document image , 2015, ACPR.

[11] Hyung Jeong Yang,et al. A mixture model using Random Rotation Bounding Box to detect table region in document image , 2016, J. Vis. Commun. Image Represent..

[12] Soo-Hyung Kim,et al. A robust system for document layout analysis using multilevel homogeneity structure , 2017, Expert Syst. Appl..

[13] Tamir Hassan,et al. ICDAR 2013 Table Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[14] Abdel Belaïd,et al. Table Detection in Handwritten Chemistry Documents Using Conditional Random Fields , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[15] Giorgio Orsi,et al. A methodology for evaluating algorithms for table understanding in PDF documents , 2012, DocEng '12.

[16] Faisal Shafait,et al. Table detection in heterogeneous documents , 2010, DAS '10.

[17] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..