UAIC Participation at RTE4

Textual entailment recognition is the task of deciding, given two text fragments, whether the meaning of one text can be deduced from the other. This year, at our third participation in the RTE competition, we improved the system built for the RTE4 competition. Main Task: The main idea of our system is to map every word in the hypothesis to one or more words in the text. For that, we transform the hypothesis, using extensive semantic knowledge from sources like DIRT, WordNet, VerbOcean, Wikipedia and the Acronym database. The main improvement this year was related to the pre-processing part. Last year we observed how this part can improve the quality of the output for the tools used (LingPipe and Minipar). Because this year the texts were obtained from a variety of sources and were not edited from their source documents, we focused on this part. Thus, we identify and eliminate special characters that occur frequently on web pages. This choice is based on the fact that “with or without these characters the meaning of the text is the same, but the quality of the tools output is improved. Additionally, we process the LingPipe output with GATE in order to identify some named entities categories unidentified by LingPipe such as nationality, language, and job. One of the better components of last year’s system, the one responsible with the solving of contradiction cases, has not functioned properly this year. Also, cases in which the texts were very long and hypothesis were very short, but for which most of the words in the hypothesis were found in the text, were not treated properly by our system, because we did not use proper differences that come from semantic role labeling. Pilot Task: Regarding the new pilot task introduced this year, we used Lucene in order to index documents in which we must identify sentences that entail a given hypothesis. On this index we performed searches using the initial hypotheses, and after filtering the results offered by Lucene, we applied our RTE system.