论文引用

LakeBench: Benchmarks for Data Discovery over Data Lakes

I. Abdelaziz, Kavitha Srinivas, Tejaswini Pedapati et al.,

2023,

ArXiv

Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables...

Robust Table Structure Recognition with Dynamic Queries Enhanced Detection Transformer

Qiang Huo, Lei Sun, Weihong Lin et al.,

2023,

Pattern Recognit.

We present a new table structure recognition (TSR) approach, called TSRFormer, to robustly recognizing the structures of complex tables with geometrical distortions from various table images. Unlike p...

DocTr: Document Transformer for Structured Information Extraction in Documents

Z. Tu, Aruni RoyChowdhury, R. Manmatha et al.,

2023 IEEE/CVF International Conference on Computer Vision (ICCV)

We present a new formulation for structured information extraction (SIE) from visually rich documents. It aims to address the limitations of existing IOB tagging or graph-based formulations, which are...

SciA11y: Converting Scientific Papers to Accessible HTML

Daniel S. Weld, Lucy Lu Wang, Jonathan Bragg et al.,

2021,

ASSETS

We present SciA11y, a system that renders inaccessible scientific paper PDFs into HTML. SciA11y uses machine learning models to extract and understand the content of scientific PDFs, and reorganizes t...

KAAPA: Knowledge Aware Answers from PDF Analysis

David Konopnicki, Avirup Sil, Mustafa Canim et al.,

2021,

AAAI

We present KAAPA (Knowledge Aware Answers from PDF Analysis), an integrated solution for machine reading comprehension over both text and tables extracted from PDFs. KAAPA enables interactive question...

Show, Read and Reason: Table Structure Recognition with Flexible Context Aggregator

Rongrong Ji, Deqiang Jiang, Xin Li et al.,

2021,

ACM Multimedia

We investigate the challenging problem of table structure recognition in this work. Many recent methods adopt graph-based context aggregator with strong inductive bias to reason sparse contextual rela...

Parsing Table Structures in the Wild

Gui-Song Xia, Nan Xue, Rujiao Long et al.,

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

This paper tackles the problem of table structure parsing (TSP) from images in the wild. In contrast to existing studies that mainly focus on parsing well-aligned tabular images with simple layouts fr...

An Accuracy-Maximization Approach for Claims Classifiers in Document Content Analytics for Cybersecurity

K. Perumalla, H. Sharif, M. Hempel et al.,

2022,

J. Cybersecur. Priv.

This paper presents our research approach and findings towards maximizing the accuracy of our classifier of feature claims for cybersecurity literature analytics, and introduces the resulting model Cl...

Beyond Document Page Classification: Design, Datasets, and Challenges

Matthew B. Blaschko, Jordy Van Landeghem, M. Moens et al.,

2023,

ArXiv

This paper highlights the need to bring document classification benchmarking closer to real-world applications, both in the nature of data tested ($X$: multi-channel, multi-paged, multi-industry; $Y$:...

AttesTable at SemEval-2021 Task 9: Extending Statement Verification with Tables for Unknown Class, and Semantic Evidence Finding

Aadish Jain, Pratik Ratadiya, Harshit Varma et al.,

2021,

SEMEVAL

This paper describes our approach for Task 9 of SemEval 2021: Statement Verification and Evidence Finding with Tables. We participated in both subtasks, namely statement verification and evidence find...

Design of a Novel Information System for Semi-automated Management of Cybersecurity in Industrial Control Systems

K. Perumalla, H. Sharif, M. Hempel et al.,

2022,

ACM Trans. Manag. Inf. Syst.

There is an urgent need in many critical infrastructure sectors, including the energy sector, for attaining detailed insights into cybersecurity features and compliance with cybersecurity requirements...

Retrieving Complex Tables with Multi-Granular Graph Representation Learning

Pedro A. Szekely, Jay Pujara, Kexuan Sun et al.,

2021,

SIGIR

The task of natural language table retrieval (NLTR) seeks to retrieve semantically relevant tables based on natural language queries. Existing learning systems for this task often treat tables as plai...

FinQA: A Dataset of Numerical Reasoning over Financial Data

Sameena Shah, Ting-Hao Huang, Bryan R. Routledge et al.,

2021,

EMNLP

The sheer volume of financial statements 001 makes it difficult for humans to access and an002 alyze a business’s financials. Robust numeri003 cal reasoning likewise faces unique challenges 004 in thi...

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks

Xiaomo Liu, Xianzhi Li, Zhiqiang Ma et al.,

2023,

EMNLP

The most recent large language models(LLMs) such as ChatGPT and GPT-4 have shown exceptional capabilities of generalist models, achieving state-of-the-art performance on a wide range of NLP tasks with...

DocILE 2023 Teaser: Document Information Localization and Extraction

Ahmed Hamdi, vStvep'an vSimsa, Milan vSulc et al.,

2023,

ECIR

The lack of data for information extraction (IE) from semi-structured business documents is a real problem for the IE community. Publications relying on large-scale datasets use only proprietary, unpu...

Current Status and Performance Analysis of Table Recognition in Document Images With Deep Neural Networks

Didier Stricker, Marcus Liwicki, Muhammad Zeshan Afzal et al.,

2021,

IEEE Access

The first phase of table recognition is to detect the tabular area in a document. Subsequently, the tabular structures are recognized in the second phase in order to extract information from the respe...

TDeLTA: A Light-weight and Robust Table Detection Method based on Learning Text Arrangement

Xiangping Wu, Qingcai Chen, Yang Fan et al.,

2023,

AAAI

The diversity of tables makes table detection a great challenge, leading to existing models becoming more tedious and complex. Despite achieving high performance, they often overfit to the table style...

TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content

R. Shah, Siddhesh Bangar, Avinash Anand et al.,

2023,

MMIR@MM

The automatic recognition of tabular data in document images presents a significant challenge due to the diverse range of table styles and complex structures. Tables offer valuable content representat...

CORD-19: The Covid-19 Open Research Dataset

Oren Etzioni, Boya Xie, Kuansan Wang et al.,

2020,

NLPCOVID19

The COVID-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on COVID-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of te...

WEATHERGOV+: A Table Recognition and Summarization Dataset to Bridge the Gap Between Document Image Analysis and Natural Language Generation

A. Albu, Melissa Cote, Amanda Dash,

2023,

DocEng

Tables, ubiquitous in data-oriented documents like scientific papers and financial statements, organize and convey relational information. Automatic table recognition from document images, which invol...