Table Structure Identification from Document Images: A Survey

Table structure identification has received significant research attention in the past few years. The OCR (Optical Character Recognition) has faced many potential errors in the document images, so it is strongly required to make logical objects such as table explicit. So to get the deep understanding about the contents of the document, proper understanding of the document is required by means of many algorithms. In this research paper, I emphasized to describe various methods or algorithms which segment the scanned images into different blocks and detect any tabular structure in any form that may be present in the document. INTRODUCTION A large number of pages are to be scanned and analyzed to create document image libraries targeted to real world applications. Creating a document image library involves a chain of thorough and intense activities like scanning, pre-processing, segmentation, layout analysis, storage and retrieval, etc. despite being the most researched field in the domain of Document Image Analysis (DIA), the problems are yet to be solved up to the desired level of accuracy and efficiency. There are lots of methods proposed by different persons for the identification of tabular structure from the document images along with some benefits as well as some limitations. To find the tabular structure from document image classification on the presence of any tabular structure in a page lead to better segmentation at a lower computing cost. The term “tabular structure” resembles with a table. There are large number of methods and algorithms proposed by different persons for the detection/segmentation of tabular structure. In this research paper we present a brief review of the past work under the Table category. Table: A table contains at least two rows and two columns, which may be fully or partially embedded in boxes formed by horizontal and vertical rule lines. Table detection and segmentation have been done in several ways at different times. The algorithms may be classified broadly into two types. These are as follows: 1. Based on the presence of rule lines in the table and 2. Based on the knowledge of table layout. Identification of Table: Our main topic of interest is identification of table which can be done by the following steps:1. Table Detection: Locating the regions of a document with a tabular content. 2. Table Structure Recognition: Reconstructing the cellular structure of a table. 3. Table Interpretation: Rediscovering the meaning of the tabular structure. This includes:International Journal of Innovations & Advancement in Computer Science