A survey of table recognition

Abstract.Table characteristics vary widely. Consequently, a great variety of computational approaches have been applied to table recognition. In this survey, the table recognition literature is presented as an interaction of table models, observations, transformations, and inferences. A table model defines the physical and logical structure of tables; the model is used to detect tables and to analyze and decompose the detected tables. Observations perform feature measurements and data lookup, transformations alter or restructure data, and inferences generate and test hypotheses. This presentation clarifies both the decisions made by a table recognizer and the assumptions and inferencing techniques that underlie these decisions.

[1]  Bertrand Coüasnon,et al.  A real-world evaluation of a generic document recognition method applied to a military form of the 19th century , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[2]  WOLFGANG TERSTEEGEN SCANTAB: TABLE RECOGNITION BY REFERENCE TABLES , 1998 .

[3]  Vishal Misra,et al.  Detection of Horizontal Lines in Noisy Run Length Encoded Images: The FAST Method , 1995, GREC.

[4]  H.S. Baird,et al.  A retargetable table reader , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[5]  M. Armon Rahgozar,et al.  Graph-based table recognition system , 1996, Electronic Imaging.

[6]  Frank Lebourgeois,et al.  Document understanding using probabilistic relaxation: application on tables of contents of periodicals , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[7]  Robert M. Haralick,et al.  Recursive X-Y cut using bounding boxes of connected components , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[8]  Robert M. Haralick,et al.  Document image understanding: geometric and logical layout , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[9]  L. O'Gorman Image and document processing techniques for the RightPages electronic library system , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[10]  Vishal Misra,et al.  Interpreting and representing tabular documents , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Katsuhiko Itonori,et al.  Table structure recognition based on textblock arrangement and ruled line position , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[12]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Shona Douglas,et al.  Layout and Language: lists and tables in technical documents , 1996 .

[14]  Jun'ichi Tsujii,et al.  A method to integrate tables of the World Wide Web , 2001 .

[15]  W. Bruce Croft,et al.  TINTIN: a system for retrieval in text tables , 1997, DL '97.

[16]  Thomas Kieninger,et al.  Three approaches to "industrial" table spotting , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[17]  Daniel P. Lopresti,et al.  Table structure recognition and its evaluation , 2000, IS&T/SPIE Electronic Imaging.

[18]  O. Kempthorne,et al.  Introduction to experimental design , 1994 .

[19]  Daniel P. Lopresti,et al.  Why table ground-truthing is hard , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[20]  Anil K. Jain,et al.  A Generic System for Form Dropout , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Francesca Cesarini,et al.  Structured document segmentation and representation by the modified X-Y tree , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[22]  Ray Ovid Hall Handbook of Tabular Presentation. , 1944 .

[23]  Yalin Wang,et al.  Automatic table ground truth generation and a background-analysis-based table structure extraction method , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[24]  Xinxin Wang,et al.  Tabular Abstraction, Editing, and Formatting , 1996 .

[25]  Marc Parizeau,et al.  Bayesian networks classifiers applied to documents , 2002, Object recognition supported by user interaction for service robots.

[26]  Abdel Belaïd Recognition of table of contents for electronic library consulting , 2001, International Journal on Document Analysis and Recognition.

[27]  Matthew Hurst,et al.  Layout and Language: Integrating Spatial and Linguistic Knowledge for Layout Understanding Tasks , 2000, COLING.

[28]  Leonid I. Perlovsky,et al.  Conundrum of Combinatorial Complexity , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Daniel P. Lopresti,et al.  Evaluating the performance of table processing algorithms , 2002, International Journal on Document Analysis and Recognition.

[30]  Naoki Asada,et al.  Complex Table Form Analysis Using Graph Grammar , 2002, Document Analysis Systems.

[31]  Shona Douglas,et al.  Layout and language: preliminary investigations in recognizing the structure of tables , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[32]  Yolande Belaïd,et al.  Form Item Extraction Based on Line Searching , 1995, GREC.

[33]  William Kornfeld,et al.  Automatically locating, extracting and analyzing tabular data , 1998, SIGIR '98.

[34]  Daniel P. Lopresti Exploiting WWW Resources in Experimental Document Analysis Research , 2002, Document Analysis Systems.

[35]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[36]  Hsin-Hsi Chen,et al.  Mining Tables from Large Scale HTML Texts , 2000, COLING.

[37]  Osamu Hori,et al.  Robust table-form structure analysis based on box-driven reasoning , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[38]  Rangachar Kasturi,et al.  Information extraction from tabular drawings , 1994, Electronic Imaging.

[39]  George Nagy,et al.  HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS , 1984 .

[40]  Daniel P. Lopresti,et al.  Medium-independent table detection , 1999, Electronic Imaging.

[41]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[42]  Francesca Cesarini,et al.  INFORMys: A Flexible Invoice-Like Form-Reader System , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Daniel P. Lopresti,et al.  A Tabular Survey of Automated Table Processing , 1999, GREC.

[44]  John C. Handley,et al.  Table analysis for multiline cell identification , 2000, IS&T/SPIE Electronic Imaging.

[45]  Matthew Hurst,et al.  Layout and Language: Challenges for Table Understanding on the Web , 2001 .

[46]  Yalin Wang,et al.  Detecting Tables in HTML Documents , 2002, Document Analysis Systems.

[47]  John C. Handley Table analysis for multi-line cell identifica-tion , 2001 .

[48]  Thomas Kieninger,et al.  Applying the T-Recs table recognition system to the business letter domain , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[49]  David W. Embley,et al.  Recognizing records from the extracted cells of microfilm tables , 2002, DocEng '02.

[50]  Y. Hirayama,et al.  A method for table structure analysis using DP matching , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[51]  Anil K. Jain,et al.  Document Representation and Its Application to Page Decomposition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Maria Petrou,et al.  Learning in Pattern Recognition , 1999, MLDM.

[53]  Seong-Whan Lee,et al.  Parameter-Free Geometric Document Layout Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Thomas G Kieninger,et al.  Table structure recognition based on robust block segmentation , 1998, Electronic Imaging.

[55]  Mahesh Viswanathan,et al.  Document recognition: an attribute grammar approach , 1996, Electronic Imaging.

[56]  Devika Subramanian,et al.  Customizing information capture and access , 1997, TOIS.

[57]  Dudley J. Cowden,et al.  Handbook of Tabular Presentation , 1944 .

[58]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Mahesh Viswanathan,et al.  Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  Song Mao,et al.  Empirical Performance Evaluation Methodology and Its Application to Page Segmentation Algorithms , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Robert M. Haralick,et al.  Document structure analysis and performance evaluation , 1999 .

[62]  D. H. Chang,et al.  Extracting Tabular Information From Text Files , 1996 .

[63]  Matthew Hurst Layout and language: an efficient algorithm for detecting text blocks based on spatial and linguistic evidence , 2000, IS&T/SPIE Electronic Imaging.

[64]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[65]  Edward A. Green,et al.  Model-based analysis of printed tables , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[66]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[67]  Yuki Hirayama,et al.  A block segmentation method for document images with complicated column structures , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[68]  Yalin Wang,et al.  Zone content classification and its performance evaluation , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[69]  Jianying Hu,et al.  Experiments in Table Recognition , 2001 .

[70]  Daniel P. Lopresti,et al.  Evaluating document analysis results via graph probing , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[71]  Bertrand Coüasnon DMOS: a generic document recognition method, application to an automatic generator of musical scores, mathematical formulae and table structures recognition systems , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[72]  A. Pizano Extracting line features from images of business forms and tables , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol. III. Conference C: Image, Speech and Signal Analysis,.

[73]  A. Laurentini,et al.  Identifying and understanding tabular material in compound documents , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[74]  Horst Bunke Structural and Syntactic Pattern Recognition , 1993, Handbook of Pattern Recognition and Computer Vision.

[75]  Rangachar Kasturi,et al.  Structural recognition of tabulated data , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[76]  Hwee Tou Ng,et al.  Learning to Recognize Tables in Free Text , 1999, ACL.

[77]  Toyohide Watanabe,et al.  Layout Recognition of Multi-Kinds of Table-Form Documents , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[78]  Bing Liu,et al.  New method for logical structure extraction of form document image , 1999, Electronic Imaging.

[79]  Francesca Cesarini,et al.  Trainable table location in document images , 2002, Object recognition supported by user interaction for service robots.

[80]  Yalin Wang,et al.  Table Detection via Probability Optimization , 2002, Document Analysis Systems.

[81]  Azriel Rosenfeld,et al.  Digital Picture Processing, Volume 1 , 1982 .

[82]  Klaus Hinkelmann,et al.  Design and Analysis of Experiments: Introduction to Experimental Design , 1994 .

[83]  Atsuhiro Takasu,et al.  A rule learning method for academic document image processing , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[84]  Naoki Asada,et al.  Table form document synthesis by grammar-based structure analysis , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[85]  Stephan Lewandowsky,et al.  The Perception of Statistical Graphs , 1989 .

[86]  S.W. Lam,et al.  Anatomy of a form reader , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[87]  Konstantin Zuyev Table image segmentation , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[88]  Haruhiko Kojima,et al.  Table recognition for automated document entry system , 1991, Other Conferences.

[89]  Atsuhiro Takasu,et al.  A document understanding method for database construction of an electronic library , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[90]  Ihsin T. Phillips,et al.  Empirical Performance Evaluation of Graphics Recognition Systems , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[91]  Joan H. Coll,et al.  Graphs and tables: a four-factor experiment , 1994, CACM.

[92]  Vishal Misra,et al.  Efficient interpretation of tabular documents , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[93]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .