ForecastQA: A Question Answering Challenge for Event Forecasting with Temporal Text Data

Event forecasting is a challenging, yet consequential task, as humans seek to constantly plan for the future. Existing automated forecasting approaches rely mostly on structured data, such as time-series or event-based knowledge graphs, to help predict future events. In this work, we formulate the forecasting problem as a restricted-domain, multiple-choice, question-answering (QA) task that simulates the forecasting scenario. To showcase the usefulness of this task formulation, we introduce a dataset ForecastQA, a question-answering dataset consisting of 10,392 event forecasting questions, which have been collected and verified via crowdsourcing efforts. We also present our experiments on ForecastQA using BERT-based models and find that our best model achieves 61.0\% accuracy on the dataset, which is still far behind human performance by about 18%. We hope ForecastQA will support future research efforts in bridging this gap.

[1]  Germain Forestier,et al.  Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.

[2]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[3]  Clionadh Raleigh,et al.  Introducing ACLED: An Armed Conflict Location and Event Dataset , 2010 .

[4]  Hannaneh Hajishirzi,et al.  UnifiedQA: Crossing Format Boundaries With a Single QA System , 2020, FINDINGS.

[5]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[6]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[7]  Tobias Böhmelt,et al.  Predicting the duration of the Syrian insurgency , 2014 .

[8]  Patrick T. Brandt,et al.  Real Time, Time Series Forecasting of Inter- and Intra-State Political Conflict , 2011 .

[9]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[10]  Vasanthan Raghavan,et al.  Hidden Markov Models for the Activity Profile of Terrorist Groups , 2012, ArXiv.

[11]  Jean-Paul A. Barthès,et al.  Knowledge Management , 1994, Encyclopedia of Database Systems.

[12]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[13]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[14]  Rok Sosic,et al.  SAGE: A Hybrid Geopolitical Event Forecasting System , 2019, IJCAI.

[15]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[16]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Gerhard Weikum,et al.  TempQuestions: A Benchmark for Temporal Question Answering , 2018, WWW.

[19]  Nanyun Peng,et al.  TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions , 2020, EMNLP.

[20]  Gabriel Stanovsky,et al.  DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.

[21]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[22]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[23]  Yejin Choi,et al.  Social IQA: Commonsense Reasoning about Social Interactions , 2019, EMNLP 2019.

[24]  Ting Liu,et al.  Constructing Narrative Event Evolutionary Graph for Script Event Prediction , 2018, IJCAI.

[25]  Heyan Huang,et al.  Open Domain Event Extraction Using Neural Latent Variable Models , 2019, ACL.

[26]  Dan Roth,et al.  “Going on a vacation” takes longer than “Going for a walk”: A Study of Temporal Commonsense Understanding , 2019, EMNLP.

[27]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[28]  Christopher Clark,et al.  Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[29]  Yejin Choi,et al.  Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning , 2019, EMNLP.

[30]  Sydney E. Scott,et al.  Psychological Strategies for Winning a Geopolitical Forecasting Tournament , 2014, Psychological science.

[31]  Naren Ramakrishnan,et al.  Combining heterogeneous data sources for civil unrest forecasting , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[32]  Svitlana Volkova,et al.  Using Social Media to Predict the Future: A Systematic Literature Review , 2017, ArXiv.

[33]  Philip E. Tetlock,et al.  Bringing probability judgments into policy debates via forecasting tournaments , 2017, Science.

[34]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[35]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[36]  Tanya Goyal,et al.  Embedding time expressions for deep temporal ordering models , 2019, ACL.

[37]  Sebastian Schutte Regions at Risk: Predicting Conflict Zones in African Insurgencies* , 2016, Political Science Research and Methods.

[38]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[39]  Naren Ramakrishnan,et al.  Detecting and forecasting domestic political crises: a graph-based approach , 2014, WebSci '14.

[40]  Yuzhong Qu,et al.  Reading Comprehension with Graph-based Temporal-Casual Reasoning , 2018, COLING.

[41]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[42]  Gerhard Weikum,et al.  TEQUILA: Temporal Question Answering over Knowledge Bases , 2018, CIKM.

[43]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[44]  Songlin Hu,et al.  SAM-Net: Integrating Event-Level and Chain-Level Attentions to Predict What Happens Next , 2019, AAAI.

[45]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[46]  Aravind Srinivasan,et al.  'Beating the news' with EMBERS: forecasting civil unrest using open source indicators , 2014, KDD.

[47]  Philip E. Tetlock,et al.  Superforecasting: The Art and Science of Prediction , 2015 .

[48]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[49]  Eric Horvitz,et al.  Mining the web to predict future events , 2013, WSDM.

[50]  Juan-Zi Li,et al.  What Happens Next? Future Subevent Prediction Using Contextual Hierarchical LSTM , 2017, AAAI.

[51]  Zijian Wang,et al.  Answering Complex Open-domain Questions Through Iterative Query Generation , 2019, EMNLP.

[52]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[53]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.