National University of Singapore at the TREC 13 Question Answering Main Task

In the past two years in our participation in TREC, our efforts (Yang et al., 2002, 2003) have been focused on incorporating external knowledge for boosting document and passage retrieval performance in event-based open domain question answering (QA). Despite our previous successes, we have identified three weaknesses of our system with respect to this year’s task guidelines. First, our system works at the surface level to extract answers, by picking the first occurrence of a string that matches the question target type from the highest-ranked passage. As such, our answer extraction relies heavily on the results of passage retrieval and named entity tagging. However, a passage that contains the correct answer may contain other strings of the same target type (Light et al., 2001) which can lead to an incorrect string being extracted. A technique to select the answer string that has the correct relationships with respect to the other words in the question is needed. Second, our definitional QA system utilizes manually-constructed definition patterns. While these patterns are precise in selecting definition sentences, they are strict in matching (slot-by-slot matching using regular expressions), failing to match correct sentences with minor variations. Third, this year’s guidelines state that factoid and list questions are not independent; instead, they are all related to given topics. Under such contextual QA scenario, we need to revise our framework to exploit existing topic-relevant knowledge in answering such questions. Accordingly, we focus on the following three features in this year’s TREC: (1) To give appropriate evidence to answer extraction, we use grammatical dependency relations among question terms to reinforce answer selection. In contrast to previous work in matching dependency relations, we propose to measure the similarity between relations to rank answer strings.