Identifying and understanding tabular material in compound documents

Tables are important components of technical documents. This paper addresses the following problems: (i) identifying a tabular component in a scanned image of a compound document containing text, drawings, diagrams, etc.; (ii) understanding the content of the table in order to convert the table into electronic format. As far as the authors are aware, the problems addressed are new. An algorithm for performing both the above tasks has been studied and implemented. Preliminary experimental results indicate satisfactory performance for many table lay-out styles.<<ETX>>

[1]  S. Tsujimoto,et al.  Understanding multi-articled documents , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[2]  Dave Elliman,et al.  A review of segmentation and contextual analysis techniques for text recognition , 1990, Pattern Recognit..

[3]  James R. Gattiker,et al.  A System for Interpretation of Line Drawings , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Norihiro Hagita,et al.  Automated entry system for printed documents , 1990, Pattern Recognit..

[6]  Dov Dori,et al.  A syntactic/geometric approach to recognition of dimensions in engineering machine drawings , 1989, Comput. Vis. Graph. Image Process..

[7]  Sargur N. Srihari,et al.  Classification of newspaper image blocks using texture analysis , 1989, Comput. Vis. Graph. Image Process..

[8]  Sargur N. Srihari Document Image Understanding , 1986, FJCC.

[9]  Kazuhiro Mori,et al.  An Automatic Circuit Diagram Reader with Loop-Structure-Based Symbol Recognition , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  George Nagy,et al.  HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS , 1984 .

[11]  Masayuki Okamoto,et al.  An Experimental Implementation of a Document Recognition System for Papers Containing Mathematical Expressions , 1992 .

[12]  Anil K. Jain,et al.  Segmentation of document images , 1989, SMC.

[13]  Sargur N. Srihari,et al.  A Rule-Based System for Document Understanding , 1986, AAAI.