BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles

A riddle is a question or statement with double or veiled meanings, followed by an unexpected answer. Solving riddle is a challenging task for both machine and human, testing the capability of understanding figurative, creative natural language and reasoning with commonsense knowledge. We introduce BiRdQA, a bilingual multiple-choice question answering dataset with 6614 English riddles and 8751 Chinese riddles. For each riddle-answer pair, we provide four distractors with additional information from Wikipedia. The distractors are automatically generated at scale with minimal bias. Existing monolingual and multilingual QA models fail to perform well on our dataset, indicating that there is a long way to go before machine can beat human on solving tricky riddles. The dataset has been released to the community.

[1]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[2]  Hannaneh Hajishirzi,et al.  UnifiedQA: Crossing Format Boundaries With a Single QA System , 2020, FINDINGS.

[3]  Mohammed J. Zaki,et al.  A Survey of Figurative Language and Its Computational Detection in Online Social Networks , 2020, ACM Trans. Web.

[4]  Jing Jiang,et al.  Counterfactual Variable Control for Robust and Interpretable Question Answering , 2020, ArXiv.

[5]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[6]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[7]  Joyce Yue Chai,et al.  Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches , 2019, ArXiv.

[8]  Tony Veale,et al.  Creative Language Retrieval: A Robust Hybrid of Information Retrieval and Linguistic Creativity , 2011, ACL.

[9]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[10]  Alan Dundes,et al.  Toward a Structural Definition of the Riddle , 1963 .

[11]  Ali Farhadi,et al.  From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Wanxiang Che,et al.  Pre-Training with Whole Word Masking for Chinese BERT , 2019, ArXiv.

[13]  Rachel Rudinger,et al.  Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Jiancheng Lv,et al.  RikiNet: Reading Wikipedia Pages for Natural Question Answering , 2020, ACL.

[16]  Archer S. Taylor English riddles from oral tradition , 1977 .

[17]  Henryk Michalewski,et al.  Measuring and Improving BERT’s Mathematical Abilities by Predicting the Order of Reasoning. , 2021, ACL.

[18]  Jing Li,et al.  Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings , 2018, NAACL.

[19]  Nan Duan,et al.  Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering , 2019, AAAI.

[20]  Sebastian Riedel,et al.  MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[21]  Wenhan Xiong,et al.  Zero-shot Fact Verification by Claim Generation , 2021, ACL.

[22]  Deyi Xiong,et al.  BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels , 2019, EMNLP.

[23]  Wang Ling,et al.  Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , 2017, ACL.

[24]  Ming Zhou,et al.  Solving and Generating Chinese Character Riddles , 2016, EMNLP.

[25]  Alexander Yates,et al.  Types of Common-Sense Knowledge Needed for Recognizing Textual Entailment , 2011, ACL.

[26]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[27]  Philip Bachman,et al.  NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[28]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[29]  Mark J. Gierl,et al.  Developing, Analyzing, and Using Distractors for Multiple-Choice Tests in Education: A Comprehensive Review , 2017 .

[30]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[31]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[32]  H. C. Goodrich,et al.  Distractor Efficiency in Foreign Language Testing , 1977 .

[33]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[34]  Lifu Tu,et al.  An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models , 2020, Transactions of the Association for Computational Linguistics.

[35]  Xian Wu,et al.  Automatic Distractor Generation for Multiple Choice Questions in Standard Tests , 2020, COLING.

[36]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[37]  John Liu,et al.  sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings , 2015, ArXiv.

[38]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[39]  S. Dowling Poetics , 2021, The Year's Work in Critical and Cultural Theory.

[40]  Hanmeng Liu,et al.  LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning , 2020, IJCAI.

[41]  Roger C. Schank,et al.  Natural language processing: what's really involved? , 1987, TINLAP '87.

[42]  Bill Yuchen Lin,et al.  RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge , 2021, FINDINGS.

[43]  Piji Li,et al.  Generating Distractors for Reading Comprehension Questions from Real Examinations , 2018, AAAI.

[44]  Yejin Choi,et al.  Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning , 2019, EMNLP.

[45]  Shuohang Wang,et al.  What does BERT Learn from Multiple-Choice Reading Comprehension Datasets? , 2019, ArXiv.

[46]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[47]  Sophocles Oedipus the King , 2020, Sophocles: Oedipus the King: A New Verse Translation.

[48]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.