Zoning documents increases the resolution of indexingfrom the image level to the field level. A line-delimited tabulardocument forms a well defined series of regions. However,as image quality decrease...
Within the scientific literature, tables are commonly used to present factual and statistical information in a compact way, which is easy to digest by readers. The ability to “understand” the structur...
With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insur...
We present a novel methodology for extracting the structure of handwritten filled table-forms. The method identifies the table-form line intersections, detecting and correcting wrong intersections pro...
We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet. Existing research for image-based table...
Translating document renderings (e.g. PDFs, scans) into hierarchical structures is extensively demanded in the daily routines of many real-world applications, and is often a prerequisite step of many ...
This paper summarizes the core idea of the T-Recs table recognition system, an integrated system covering block-segmentation, table location and a model-free structural analysis of tables. T-Recs work...
This paper presents an approach to extract the structure of pre-printed and hand-filled table-forms. The first module performs the cell identification based on Watershed transform. A second module det...
This paper presents a novel method for the analysis of tabular structures in document images using the potential of deformable convolutional networks. In order to assess the suitability of the model t...
This paper presents a novel end-to-end system for table understanding in document images called DeepDeSRT. In particular, the contribution of DeepDeSRT is two-fold. First, it presents a deep learning-...
This paper introduces three approaches for an industrial, comprehensive document analysis system to enable it to spot tables in documents. Searching for a set of known table headers (approach 1) works...
The paper presents a complete solution for recognition of textual and graphic structures in various types of documents acquired from the Internet. In the proposed approach, the document structure reco...
The explosive growth and popularity of the Web has resulted in a huge amount of digital information sources on the Internet. Unfortunately, such sources only manage data, rather than the knowledge the...
Tables present summarized and structured information to the reader, which makes table's structure extraction an important part of document understanding applications. However, table structure identifi...
Table localization and segmentation is an important but critical step in document image analysis. Table segmentation is much harder than table localization particularly in the invoice document because...
Localizing page elements/objects such as tables, figures, equations, etc. is the primary step in extracting information from document images. We propose a novel end-to-end trainable deep network, (CDe...
Large-scale, multiterabyte digital libraries are becoming feasible due to decreasing costs of storage, CPU, and bandwidth. However, costs associated with preparing content for input into the library r...
It is said that the world knowledge is in the Internet. Scientific knowledge is in the books, journals and conference proceedings. Yet both repositories are too large to skim through manually. We need...
In this work, we present a novel and generic approach, Figure and Formula Detector (FFD) to detect the formulas and figures from document images. Our proposed method employs traditional computer visio...
In this paper, we present document information content (i.e. text fields) extraction technique via graph mining. Real-world users first provide a set of key text fields from the document image which t...