In this paper we present a supervised recognition method for entailment between binary lexicosyntactic patterns such as X is the capital of Y and X is in Y. Recognizing entailment relations between patterns is useful for applications such as question answering, which is our main motivation in this work. Since sentences entailing each other are natural paraphrases, entailment is closely related to paraphrasing. Many researchers have successfully used unsupervised distributional similarity based methods for paraphrase acquisition [4, 6, 1], and our own experience with NICT’s spoken question answering system Ikkyu [7] 1 confirms their effectiveness. If Ikkyu could also detect that X is the capital of Y entails X is in Y, it would be able to answer the question “Where is Paris?” from the information that “Paris is the capital of France”. However, X is the capital of Y and X is in Y are not strict paraphrases, and indeed their distributional profiles exhibit large differences. Ikkyu’s current paraphrasing engine is based on distributional similarity between patterns, and so is highly sensitive to such differences. This is the reason Ikkyu currently cannot exploit the information that “Paris is the capital of France” to answer the question “Where is Paris?”. By adding an accurate and robust entailment recognition module that can recognize entailment pairs even with large differences in distributional profile, we aim to further improve Ikkyu’s recall. In this work we explore a supervised method for entailment recognition that uses both distributional similarities and surface/syntactic features. We show that this supervised approach yields better performance than state-of-the-art unsupervised methods, like DIRT [4] or the scoring method from [2], and than supervised methods that only consider surface similarity like [5] for all types of pattern pairs, even those with very low surface similarity (i.e. sharing no content words). Our approach is targeted at Japanese but is easily applicable to other languages. We present in Section 2 a description of the resources and the features used, and in Section 3 our experimental methodology and a discussion of our results.
[1]
Kentaro Torisawa,et al.
Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations
,
2008,
ACL.
[2]
Patrick Pantel,et al.
Discovery of inference rules for question-answering
,
2001,
Natural Language Engineering.
[3]
Patrick Pantel,et al.
Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations
,
2006,
ACL.
[4]
Masaki Murata,et al.
Large-Scale Verb Entailment Acquisition from the Web
,
2009,
EMNLP.
[5]
Ion Androutsopoulos,et al.
Learning Textual Entailment using SVMs and String Similarity Measures
,
2007,
ACL-PASCAL@ACL.
[6]
Kentaro Torisawa,et al.
Similarity Based Language Model Construction for Voice Activated Open-Domain Question Answering
,
2011,
IJCNLP.
[7]
Masaki Murata,et al.
Large Scale Relation Acquisition Using Class Dependent Patterns
,
2009,
2009 Ninth IEEE International Conference on Data Mining.