TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance

Hybrid data combining both tabular and textual content (e.g., financial reports) are quite pervasive in the real world. However, Question Answering (QA) over such hybrid data is largely neglected in existing research. In this work, we extract samples from real financial reports to build a new large-scale QA dataset containing both Tabular And Textual data, named TAT-QA, where numerical reasoning is usually required to infer the answer, such as addition, subtraction, multiplication, division, counting, comparison/sorting, and their compositions. We further propose a novel QA model termed TAGOP, which is capable of reasoning over both tables and text. It adopts sequence tagging to extract relevant cells from the table along with relevant spans from the text to infer their semantics, and then applies symbolic reasoning over them with a set of aggregation operators to arrive at the final answer. TAGOP achieves 58.0% in F1, which is an 11.1% absolute increase over the previous best baseline model, according to our experiments on TAT-QA. But this result still lags far behind the performance of human expert, i.e. 90.8% in F1. It demonstrates that our TAT-QA is very challenging and can serve as a benchmark for training and testing powerful QA models that address hybrid data. Our dataset is publicly available for noncommercial use at https://nextplusplus. github.io/TAT-QA/.

[1]  Luke S. Zettlemoyer,et al.  Learning to Automatically Solve Algebra Word Problems , 2014, ACL.

[2]  Wenhu Chen,et al.  Open Question Answering over Tables and Text , 2020, ArXiv.

[3]  Wang Ling,et al.  Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , 2017, ACL.

[4]  Ming-Wei Chang,et al.  Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base , 2015, ACL.

[5]  Shuming Shi,et al.  Learning Fine-Grained Expressions to Solve Math Word Problems , 2017, EMNLP.

[6]  Gabriel Stanovsky,et al.  DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.

[7]  Ting Liu,et al.  Molweni: A Challenge Multiparty Dialogues-based Machine Reading Comprehension Dataset with Discourse Structure , 2020, COLING.

[8]  Liqiang Nie,et al.  Large-Scale Question Tagging via Joint Question-Topic Embedding Learning , 2020, ACM Trans. Inf. Syst..

[9]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[10]  Chitta Baral,et al.  Learning To Use Formulas To Solve Simple Arithmetic Problems , 2016, ACL.

[11]  Wenhu Chen,et al.  HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data , 2020, EMNLP.

[12]  Thomas Muller,et al.  TaPas: Weakly Supervised Table Parsing via Pre-training , 2020, ACL.

[13]  Peng Li,et al.  Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering , 2016, ArXiv.

[14]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  William W. Cohen,et al.  PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text , 2019, EMNLP.

[17]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[18]  Krisztian Balog,et al.  Web Table Extraction, Retrieval, and Augmentation: A Survey , 2020, ACM Trans. Intell. Syst. Technol..

[19]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[20]  Tao Yu,et al.  Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.

[21]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[22]  Jonathan Berant,et al.  Tag-based Multi-Span Extraction in Reading Comprehension , 2019, ArXiv.

[23]  Jamie Callan,et al.  Summarizing and Exploring Tabular Data in Conversational Search , 2020, SIGIR.

[24]  Jonathan Berant,et al.  The Web as a Knowledge-Base for Answering Complex Questions , 2018, NAACL.

[25]  Sebastian Riedel,et al.  Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.

[26]  Krisztian Balog,et al.  Auto-completion for Data Cells in Relational Tables , 2019, CIKM.

[27]  Zhaochun Ren,et al.  Explicit State Tracking with Semi-Supervisionfor Neural Dialogue Generation , 2018, CIKM.

[28]  Wei Chu,et al.  Question Directed Graph Attention Network for Numerical Reasoning over Text , 2020, EMNLP.

[29]  Soujanya Poria,et al.  Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering , 2021, ArXiv.

[30]  Min-Yen Kan,et al.  Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures , 2018, ACL.

[31]  Haiqin Yang,et al.  A deep learning approach for predicting the quality of online health expert question-answering services , 2016, J. Biomed. Informatics.

[32]  Zhiyuan Liu,et al.  NumNet: Machine Reading Comprehension with Numerical Reasoning , 2019, EMNLP.

[33]  Percy Liang,et al.  Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.

[34]  Graham Neubig,et al.  TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data , 2020, ACL.

[35]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[36]  Rajarshi Das,et al.  Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks , 2017, ACL.

[37]  Kenton Lee,et al.  Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension , 2019, EMNLP.

[38]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[39]  Weixin Wang,et al.  Re-examining the Role of Schema Linking in Text-to-SQL , 2020, EMNLP.

[40]  Kevin Gimpel,et al.  Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.

[41]  Hao Ma,et al.  Table Cell Search for Question Answering , 2016, WWW.