论文引用

Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables...

Qiang Huo, Lei Sun, Weihong Lin et al.,
2023,
Pattern Recognit.

We present a new table structure recognition (TSR) approach, called TSRFormer, to robustly recognizing the structures of complex tables with geometrical distortions from various table images. Unlike p...

Z. Tu, Aruni RoyChowdhury, R. Manmatha et al.,
2023 IEEE/CVF International Conference on Computer Vision (ICCV)

We present a new formulation for structured information extraction (SIE) from visually rich documents. It aims to address the limitations of existing IOB tagging or graph-based formulations, which are...

Daniel S. Weld, Lucy Lu Wang, Jonathan Bragg et al.,
2021,
ASSETS

We present SciA11y, a system that renders inaccessible scientific paper PDFs into HTML. SciA11y uses machine learning models to extract and understand the content of scientific PDFs, and reorganizes t...

We present KAAPA (Knowledge Aware Answers from PDF Analysis), an integrated solution for machine reading comprehension over both text and tables extracted from PDFs. KAAPA enables interactive question...

Rongrong Ji, Deqiang Jiang, Xin Li et al.,
2021,
ACM Multimedia

We investigate the challenging problem of table structure recognition in this work. Many recent methods adopt graph-based context aggregator with strong inductive bias to reason sparse contextual rela...

Gui-Song Xia, Nan Xue, Rujiao Long et al.,
2021 IEEE/CVF International Conference on Computer Vision (ICCV)

This paper tackles the problem of table structure parsing (TSP) from images in the wild. In contrast to existing studies that mainly focus on parsing well-aligned tabular images with simple layouts fr...

K. Perumalla, H. Sharif, M. Hempel et al.,
2022,
J. Cybersecur. Priv.

This paper presents our research approach and findings towards maximizing the accuracy of our classifier of feature claims for cybersecurity literature analytics, and introduces the resulting model Cl...

This paper highlights the need to bring document classification benchmarking closer to real-world applications, both in the nature of data tested ($X$: multi-channel, multi-paged, multi-industry; $Y$:...

Aadish Jain, Pratik Ratadiya, Harshit Varma et al.,
2021,
SEMEVAL

This paper describes our approach for Task 9 of SemEval 2021: Statement Verification and Evidence Finding with Tables. We participated in both subtasks, namely statement verification and evidence find...

K. Perumalla, H. Sharif, M. Hempel et al.,
2022,
ACM Trans. Manag. Inf. Syst.

There is an urgent need in many critical infrastructure sectors, including the energy sector, for attaining detailed insights into cybersecurity features and compliance with cybersecurity requirements...

Pedro A. Szekely, Jay Pujara, Kexuan Sun et al.,
2021,
SIGIR

The task of natural language table retrieval (NLTR) seeks to retrieve semantically relevant tables based on natural language queries. Existing learning systems for this task often treat tables as plai...

The sheer volume of financial statements 001 makes it difficult for humans to access and an002 alyze a business’s financials. Robust numeri003 cal reasoning likewise faces unique challenges 004 in thi...

Xiaomo Liu, Xianzhi Li, Zhiqiang Ma et al.,
2023,
EMNLP

The most recent large language models(LLMs) such as ChatGPT and GPT-4 have shown exceptional capabilities of generalist models, achieving state-of-the-art performance on a wide range of NLP tasks with...

The lack of data for information extraction (IE) from semi-structured business documents is a real problem for the IE community. Publications relying on large-scale datasets use only proprietary, unpu...

The first phase of table recognition is to detect the tabular area in a document. Subsequently, the tabular structures are recognized in the second phase in order to extract information from the respe...

Xiangping Wu, Qingcai Chen, Yang Fan et al.,
2023,
AAAI

The diversity of tables makes table detection a great challenge, leading to existing models becoming more tedious and complex. Despite achieving high performance, they often overfit to the table style...

R. Shah, Siddhesh Bangar, Avinash Anand et al.,
2023,
MMIR@MM

The automatic recognition of tabular data in document images presents a significant challenge due to the diverse range of table styles and complex structures. Tables offer valuable content representat...

Oren Etzioni, Boya Xie, Kuansan Wang et al.,
2020,
NLPCOVID19

The COVID-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on COVID-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of te...

A. Albu, Melissa Cote, Amanda Dash,
2023,
DocEng

Tables, ubiquitous in data-oriented documents like scientific papers and financial statements, organize and convey relational information. Automatic table recognition from document images, which invol...