Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention

Most of today’s AI systems focus on using self-attention mechanisms and transformer architectures on large amounts of diverse data to achieve impressive performance gains. In this paper, we propose to augment the transformer architecture with an external attention mechanism to bring external knowledge and context to bear. By integrating external information into the prediction process, we hope to reduce the need for ever-larger models and increase the democratization of AI systems. We find that the proposed external attention mechanism can significantly improve the performance of existing AI systems, allowing practitioners to easily customize foundation AI models to many diverse downstream applications. In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model’s reasoning capabilities. The proposed system, Knowledgeable External Attention for commonsense Reasoning (KEAR), reaches human parity on the open CommonsenseQA research benchmark with an accuracy of 89.4% in comparison to the human accuracy of 88.9%.

[1]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[2]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[3]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[4]  Guokun Lai,et al.  RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[5]  Peter Clark,et al.  Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.

[6]  Ming-Wei Chang,et al.  A Knowledge-Grounded Neural Conversation Model , 2017, AAAI.

[7]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[8]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[9]  Peter Clark,et al.  SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[10]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[11]  Xiaodong Liu,et al.  Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading , 2019, ACL.

[12]  Yejin Choi,et al.  Social IQA: Commonsense Reasoning about Social Interactions , 2019, EMNLP 2019.

[13]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Yejin Choi,et al.  Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning , 2019, EMNLP.

[15]  Xiang Ren,et al.  KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning , 2019, EMNLP.

[16]  Cristian Danescu-Niculescu-Mizil,et al.  Asking the Right Question: Inferring Advice-Seeking Intentions from Personal Narratives , 2019, NAACL.

[17]  Yejin Choi,et al.  ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.

[18]  Ali Farhadi,et al.  HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[19]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Yejin Choi,et al.  COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.

[22]  Qianglong Chen,et al.  Improving Commonsense Question Answering by Graph-based Iterative Retrieval over Multiple Knowledge Sources , 2020, COLING.

[23]  Yejin Choi,et al.  PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.

[24]  Nan Duan,et al.  Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering , 2019, AAAI.

[25]  Peter Clark,et al.  GenericsKB: A Knowledge Base of Generic Statements , 2020, ArXiv.

[26]  Donghan Yu,et al.  JAKET: Joint Pre-training of Knowledge Graph and Language Understanding , 2020, AAAI.

[27]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[28]  Ronan Le Bras,et al.  WinoGrande , 2019, AAAI.

[29]  Pedro A. Szekely,et al.  Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering , 2020, FINDINGS.

[30]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[31]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[32]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[33]  Jun Yan,et al.  Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering , 2020, EMNLP.

[34]  Doug Downey,et al.  Abductive Commonsense Reasoning , 2019, ICLR.

[35]  Jianfeng Gao,et al.  SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization , 2019, ACL.

[36]  Hannaneh Hajishirzi,et al.  UnifiedQA: Crossing Format Boundaries With a Single QA System , 2020, FINDINGS.

[37]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[38]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[39]  Jianfeng Gao,et al.  DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.

[40]  Ronan Le Bras,et al.  Generated Knowledge Prompting for Commonsense Reasoning , 2021, ACL.

[41]  Xiaodong Liu,et al.  Posterior Differential Regularization with f-divergence for Improving Model Robustness , 2020, NAACL.

[42]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[43]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[44]  Nanyun Peng,et al.  COM2SENSE: A Commonsense Reasoning Benchmark with Complementary Sentences , 2021, FINDINGS.

[45]  Bill Yuchen Lin,et al.  RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge , 2021, FINDINGS.

[46]  David R. So,et al.  Carbon Emissions and Large Neural Network Training , 2021, ArXiv.

[47]  Xuedong Huang,et al.  Fusing Context Into Knowledge Graph for Commonsense Question Answering , 2020, FINDINGS.

[48]  Jianfeng Gao,et al.  DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing , 2021, ArXiv.

[49]  Ryan A. Rossi,et al.  Learning Contextualized Knowledge Structures for Commonsense Reasoning , 2020, FINDINGS.

[50]  Jure Leskovec,et al.  QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering , 2021, NAACL.

[51]  Shuohang Wang,et al.  Leveraging Knowledge in Multilingual Commonsense Reasoning , 2021, FINDINGS.

[52]  Shuohang Wang,et al.  KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering , 2021, ArXiv.

[53]  Yejin Choi,et al.  CommonsenseQA 2.0: Exposing the Limits of AI through Gamification , 2021, NeurIPS Datasets and Benchmarks.

[54]  Yejin Choi,et al.  COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs , 2020, AAAI.

[55]  Yang Liu,et al.  Training Data is More Valuable than You Think: A Simple and Effective Method by Retrieving from Training Data , 2022, ACL.

[56]  Zhiting Hu,et al.  A Survey of Knowledge-enhanced Text Generation , 2020, ACM Comput. Surv..