FinQA: A Dataset of Numerical Reasoning over Financial Data

The sheer volume of financial statements 001 makes it difficult for humans to access and an002 alyze a business’s financials. Robust numeri003 cal reasoning likewise faces unique challenges 004 in this domain. In this work, we focus on 005 answering deep questions over financial data, 006 aiming to automate the analysis of a large cor007 pus of financial documents. In contrast to ex008 isting tasks on general domain, the finance do009 main includes complex numerical reasoning 010 and understanding of heterogeneous represen011 tations. To facilitate analytical progress, we 012 propose a new large-scale dataset, FINQA, 013 with Question-Answering pairs over Financial 014 reports, written by financial experts. We also 015 annotate the gold reasoning programs to en016 sure full explainability. We further introduce 017 baselines and conduct comprehensive experi018 ments in our dataset. The results demonstrate 019 that popular, large, pre-trained models fall far 020 short of expert humans in acquiring finance 021 knowledge and in complex multi-step numer022 ical reasoning on that knowledge. Our dataset 023 — the first of its kind — should therefore en024 able significant, new community research into 025 complex application domains1. 026

[1]  Percy Liang,et al.  Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.

[2]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[3]  Gabriel Stanovsky,et al.  DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.

[4]  D. Mackenzie An engine, not a camera , 2006 .

[5]  Jianfeng Gao,et al.  Mapping natural-language problems to formal-language solutions using structured neural representations , 2019, ICML.

[6]  Yi Yang,et al.  FinBERT: A Pretrained Language Model for Financial Communications , 2020, ArXiv.

[7]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[8]  Chuan-Ju Wang,et al.  Financial Sentiment Analysis for Risk Prediction , 2013, IJCNLP.

[9]  Jared Ellsmore,et al.  Are You for Real , 2011 .

[10]  Jiajun Zhang,et al.  Are You for Real? Detecting Identity Fraud via Dialogue Interactions , 2019, EMNLP.

[11]  Armineh Nourbakhsh,et al.  A framework for anomaly detection using language modeling, and its applications to finance , 2019, ArXiv.

[12]  Jonathan Berant,et al.  The Web as a Knowledge-Base for Answering Complex Questions , 2018, NAACL.

[13]  Quoc V. Le,et al.  Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension , 2020, ICLR.

[14]  Marc Lenglet,et al.  Cultures of high-frequency trading: mapping the landscape of algorithmic developments in contemporary financial markets , 2016 .

[15]  Jinhua Du,et al.  NextGen AML: Distributed Deep Learning based Language Technologies to Augment Anti Money Laundering Investigation , 2018, ACL.

[16]  Kenton Lee,et al.  A BERT Baseline for the Natural Questions , 2019, ArXiv.

[17]  Jun Zhao,et al.  FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining , 2020, IJCAI.

[18]  Tao Yu,et al.  Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.

[19]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[20]  Wenhu Chen,et al.  TabFact: A Large-scale Dataset for Table-based Fact Verification , 2019, ICLR.

[21]  Lucian Popa,et al.  Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition Using Visual Context , 2020, ArXiv.

[22]  Wenhu Chen,et al.  HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data , 2020, EMNLP.

[23]  Min-Yuh Day,et al.  Deep learning for financial sentiment analysis on finance news providers , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[24]  Gahgene Gweon,et al.  Point to the Expression: Solving Algebraic Word Problems Using the Expression-Pointer Transformer Model , 2020, EMNLP.

[25]  Elias L. Khalil,et al.  Donald MacKenzie ’ s An engine , not a camera : how financial models shape markets , 2009 .

[26]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Rodrigo Garcia-Verdú,et al.  Poor Numbers: How We Are Misled by African Development Statistics and What to Do about It , 2013 .

[29]  Yejin Choi,et al.  MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms , 2019, NAACL.

[30]  Dogu Araci,et al.  FinBERT: Financial Sentiment Analysis with Pre-trained Language Models , 2019, ArXiv.

[31]  Abhishek Kumar,et al.  A Multilayer Perceptron based Ensemble Technique for Fine-grained Financial Sentiment Analysis , 2017, EMNLP.

[32]  Hannaneh Hajishirzi,et al.  MAWPS: A Math Word Problem Repository , 2016, NAACL.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.