Are Training Resources Insufficient? Predict First Then Explain!

Natural language free-text explanation generation is an efficient approach to train explainable language processing models for commonsense-knowledge-requiring tasks. The most predominant form of these models is the explain-then-predict (EtP) structure, which first generates explanations and uses them for making decisions. The performance of EtP models is highly dependent on that of the explainer by the nature of their structure. Therefore, large-sized explanation data are required to train a good explainer model. However, annotating explanations is expensive. Also, recent works reveal that free-text explanations might not convey sufficient information for decision making. These facts cast doubts on the effectiveness of EtP models. In this paper, we argue that the predict-then-explain (PtE) architecture is a more efficient approach in terms of the modelling perspective. Our main contribution is twofold. First, we show that the PtE structure is the most data-efficient approach when explanation data are lacking. Second, we reveal that the PtE structure is always more training-efficient than the EtP structure. We also provide experimental results that confirm the theoretical advantages.

[1]  Thomas Lukasiewicz,et al.  Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations , 2020, ACL.

[2]  Sawan Kumar,et al.  NILE : Natural Language Inference with Faithful Natural Language Explanations , 2020, ACL.

[3]  Jonathan Berant,et al.  Explaining Question Answering Models through Text Generation , 2020, ArXiv.

[4]  Hannaneh Hajishirzi,et al.  An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction , 2020, EMNLP.

[5]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[6]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[7]  Tianxing He,et al.  Q UANTIFYING E XPOSURE B IAS FOR O PEN - ENDED L ANGUAGE G ENERATION , 2020 .

[8]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[9]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[10]  Byron C. Wallace,et al.  Learning to Faithfully Rationalize by Construction , 2020, ACL.

[11]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[12]  Nan Hua,et al.  Universal Sentence Encoder for English , 2018, EMNLP.

[13]  Richard Socher,et al.  Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.

[14]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[15]  Noah A. Smith,et al.  Measuring Association Between Labels and Free-Text Rationales , 2020, EMNLP.

[16]  Florian Schmidt Generalization in Generation: A closer look at Exposure Bias , 2019, NGT@EMNLP-IJCNLP.

[17]  Chenguang Zhu,et al.  Fusing Context Into Knowledge Graph for Commonsense Reasoning , 2020, ArXiv.

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[20]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[21]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[22]  Ivan Titov,et al.  Interpretable Neural Predictions with Differentiable Binary Variables , 2019, ACL.

[23]  Baotian Hu,et al.  You Can Do Better! If You Elaborate the Reason When Making Prediction , 2021, ArXiv.

[24]  Tommi S. Jaakkola,et al.  Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control , 2019, EMNLP.

[25]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[26]  Colin Raffel,et al.  WT5?! Training Text-to-Text Models to Explain their Predictions , 2020, ArXiv.

[27]  M. Brass,et al.  Unconscious determinants of free decisions in the human brain , 2008, Nature Neuroscience.

[28]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[29]  Xinyan Zhao,et al.  LIREx: Augmenting Language Inference with Relevant Explanation , 2020, AAAI.

[30]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[31]  Ani Nenkova,et al.  Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction , 2019, NAACL-HLT.

[32]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[33]  Jonathan Pilault,et al.  Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data , 2020, ArXiv.