Textual entailment recognition is the task of deciding, given two text fragments, whether the meaning of one text can be deduced from the other. This year, at our third participation in the RTE competition, we improved the system built for the RTE4 competition. Main Task: The main idea of our system is to map every word in the hypothesis to one or more words in the text. For that, we transform the hypothesis, using extensive semantic knowledge from sources like DIRT, WordNet, VerbOcean, Wikipedia and the Acronym database. The main improvement this year was related to the pre-processing part. Last year we observed how this part can improve the quality of the output for the tools used (LingPipe and Minipar). Because this year the texts were obtained from a variety of sources and were not edited from their source documents, we focused on this part. Thus, we identify and eliminate special characters that occur frequently on web pages. This choice is based on the fact that “with or without these characters the meaning of the text is the same, but the quality of the tools output is improved. Additionally, we process the LingPipe output with GATE in order to identify some named entities categories unidentified by LingPipe such as nationality, language, and job. One of the better components of last year’s system, the one responsible with the solving of contradiction cases, has not functioned properly this year. Also, cases in which the texts were very long and hypothesis were very short, but for which most of the words in the hypothesis were found in the text, were not treated properly by our system, because we did not use proper differences that come from semantic role labeling. Pilot Task: Regarding the new pilot task introduced this year, we used Lucene in order to index documents in which we must identify sentences that entail a given hypothesis. On this index we performed searches using the initial hypotheses, and after filtering the results offered by Lucene, we applied our RTE system.
[1]
Kalina Bontcheva,et al.
GATE: an Architecture for Development of Robust HLT applications
,
2002,
ACL.
[2]
Dekang Lin,et al.
DIRT – Discovery of Inference Rules from Text
,
2001
.
[3]
Adrian Iftene,et al.
Named Entity Relation Mining using Wikipedia
,
2008,
LREC.
[4]
Adrian Iftene.
Building a Textual Entailment System for the RTE3 Competition. Application to a QA System
,
2008,
2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.
[5]
Christiane Fellbaum,et al.
Book Reviews: WordNet: An Electronic Lexical Database
,
1999,
CL.
[6]
Ido Dagan,et al.
PROBABILISTIC TEXTUAL ENTAILMENT: GENERIC APPLIED MODELING OF LANGUAGE VARIABILITY
,
2004
.
[7]
Martha Palmer,et al.
Verbnet: a broad-coverage, comprehensive verb lexicon
,
2005
.
[8]
Dekang Lin,et al.
Dependency-Based Evaluation of Minipar
,
2003
.
[9]
Roy Bar-Haim,et al.
The Second PASCAL Recognising Textual Entailment Challenge
,
2006
.
[10]
Dan I. Moldovan,et al.
COGEX at RTE 3
,
2007,
ACL-PASCAL@ACL.
[11]
Patrick Pantel,et al.
VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations
,
2004,
EMNLP.
[12]
Patrick Pantel,et al.
DIRT @SBT@discovery of inference rules from text
,
2001,
KDD '01.
[13]
Adrian Iftene,et al.
Hypothesis Transformation and Semantic Variability Rules Used in Recognizing Textual Entailment
,
2007,
ACL-PASCAL@ACL.