Creating Causal Embeddings for Question Answering with Minimal Supervision

A common model for question answering (QA) is that a good answer is one that is closely related to the question, where relatedness is often determined using general-purpose lexical models such as word embeddings. We argue that a better approach is to look for answers that are related to the question in a relevant way, according to the information need of the question, which may be determined through task-specific embeddings. With causality as a use case, we implement this insight in three steps. First, we generate causal embeddings cost-effectively by bootstrapping cause-effect pairs extracted from free text using a small set of seed patterns. Second, we train dedicated embeddings over this data, by using task-specific contexts, i.e., the context of a cause is its effect. Finally, we extend a state-of-the-art reranking approach for QA to incorporate these causal embeddings. We evaluate the causal embedding models both directly with a casual implication task, and indirectly, in a downstream causal QA task using data from Yahoo! Answers. We show that explicitly modeling causality improves performance in both tasks. In the QA task our best model achieves 37.3% P@1, significantly outperforming a strong baseline by 7.7% (relative).

[1]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[2]  Dan I. Moldovan,et al.  Text Mining for Causal Relations , 2002, FLAIRS.

[3]  Oren Etzioni Search needs a shake-up , 2011, Nature.

[4]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[5]  Ming-Wei Chang,et al.  Question Answering Using Enhanced Lexical Semantic Models , 2013, ACL.

[6]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[7]  Jong-Hoon Oh,et al.  Why-Question Answering using Intra- and Inter-Sentential Causal Relations , 2013, ACL.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Kuzman Ganchev,et al.  Semantic Role Labeling with Neural Network Factors , 2015, EMNLP.

[10]  Hae-Chang Rim,et al.  Joint Relational Embeddings for Knowledge-based Question Answering , 2014, EMNLP.

[11]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[12]  M Durie,et al.  Spinning straw into gold. , 1972, Mental hygiene.

[13]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Eric Brill,et al.  Automatic question answering using the web: Beyond the Factoid , 2006, Information Retrieval.

[16]  Christopher S. G. Khoo,et al.  Automatic Extraction of Cause-Effect Information from Newspaper Text Without Knowledge-based Inferencing , 1998 .

[17]  Jennifer Chu-Carroll,et al.  IBM's PIQUANT II in TREC 2004 , 2004, TREC.

[18]  Peter Jansen,et al.  Discourse Complements Lexical Semantics for Non-factoid Answer Reranking , 2014, ACL.

[19]  Preslav Nakov,et al.  SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals , 2009, SEW@NAACL-HLT.

[20]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[21]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[22]  Kezhi Mao,et al.  Multi level causal relation identification using extended features , 2014, Expert Syst. Appl..

[23]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[24]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[25]  Mihai Surdeanu,et al.  Learning to Rank Answers to Non-Factoid Questions from Web Collections , 2011, CL.

[26]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[27]  J.B. Bowles,et al.  A Lightweight Tool for Automatically Extracting Causal Relationships from Text , 2006, Proceedings of the IEEE SoutheastCon 2006.

[28]  Peter Clark,et al.  A study of the knowledge base requirements for passing an elementary science test , 2013, AKBC '13.

[29]  Mihai Surdeanu,et al.  Higher-order Lexical Semantic Models for Non-factoid Answer Reranking , 2015, TACL.

[30]  Dan Roth,et al.  Minimally Supervised Event Causality Identification , 2011, EMNLP.

[31]  Jason Weston,et al.  Question Answering with Subgraph Embeddings , 2014, EMNLP.

[32]  Omer Levy,et al.  Do Supervised Distributional Methods Really Learn Lexical Inference Relations? , 2015, NAACL.

[33]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[34]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[35]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[36]  Mihai Surdeanu,et al.  Odin’s Runes: A Rule Language for Information Extraction , 2016, LREC.

[37]  Mirella Lapata,et al.  Distributed Representations for Unsupervised Semantic Role Labeling , 2015, EMNLP.

[38]  Benjamin Van Durme,et al.  Annotated Gigaword , 2012, AKBC-WEKEX@NAACL-HLT.

[39]  Stephen Clark,et al.  Specializing Word Embeddings for Similarity or Relatedness , 2015, EMNLP.

[40]  Daniel Marcu,et al.  A Noisy-Channel Approach to Question Answering , 2003, ACL.

[41]  Yulia Tsvetkov,et al.  Problems With Evaluation of Word Embeddings Using Word Similarity Tasks , 2016, RepEval@ACL.

[42]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.