Semantic Answer Validation in Question Answering Systems for Reading Comprehension Tests

In this paper it is presented a methodology for tackling the problem of answer validation in question answering for reading comprehension tests. The implemented system accepts a document as input and it answers multiple choice questions about it based on semantic similarity measures. It uses the Lucene information retrieval engine for carrying out information extraction employing additional automated linguistic processing such as stemming, anaphora resolution and part-of-speech tagging. The proposed approach validates the answers, by comparing the text retrieved by Lucene for each question with respect to its candidate answers. For this purpose, a validation based on semantic similarity is executed. We have evaluated the experiments carried out in order to verify the quality of the methodology proposed using a corpus widely used in international forums. The obtained results show that the proposed system selects the correct answer to a given question with a percentage of 12% more than with a lexical similarity based validation.