Towards a theory of tables

Tables appearing in natural language documents provide a compact method for presenting relational information in an immediate and intuitive manner, while simultaneously organizing and indexing that information. Despite their ubiquity and obvious utility, tables have not received the same level of formal characterization enjoyed by sentential text. Rather, they are modeled in terms of geometry, simple hierarchies of strings and database-like relational structures. Tables have been the focus of a large volume of research in the document image analysis field and lately, have received particular attention from researchers interested in extracting information from non-trivial elements of web pages. This paper provides a framework for representing tables at both the semantic and structural levels. It presents a representation of the indexing structures present in tables and the relationship between these structures and the underlying categories.

[1]  Daniel P. Lopresti,et al.  Table structure recognition and its evaluation , 2000, IS&T/SPIE Electronic Imaging.

[2]  Xinxin Wang,et al.  Tabular Abstraction, Editing, and Formatting , 1996 .

[3]  Philip H. Swain,et al.  Remote Sensing: The Quantitative Approach , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Hsin-Hsi Chen,et al.  Mining Tables from Large Scale HTML Texts , 2000, COLING.

[5]  Derick Wood,et al.  A Conceptual Model for Tables , 1998, PODDP.

[6]  Hwee Tou Ng,et al.  Learning to Recognize Tables in Free Text , 1999, ACL.

[7]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[8]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[9]  A. Laurentini,et al.  Identifying and understanding tabular material in compound documents , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[10]  Ted J. Biggerstaff,et al.  Table: Object oriented editing of complex structures , 1984, ICSE '84.

[11]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[12]  Edward A. Green,et al.  Model-based analysis of printed tables , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[13]  Paul B. Kantor,et al.  Document Recognition and Retrieval VIII , 2000 .

[14]  Thomas G. Kieninger,et al.  T-Recs Table Recognition and Validation Approach , 1999 .

[15]  Maryse Condé Tree of Life , 1992 .

[16]  Michael Uschold,et al.  Ontologies: principles, methods and applications , 1996, The Knowledge Engineering Review.