Extracting Features for Machine Learning in NTCIR-10 RITE Task

NTCIR-9 RITE task evaluates systems which automatically detect entailment, paraphrase, and contradiction in texts. We developed a preliminary system for the NTCIR-9 RITE task based on rules. In NTCIR-10, we tried machine learning approaches. We transformed the existing rules into features and then added additional syntactic and semantic features for SVM. The straightforward assumption was still kept in NTCIR-10: the relation between two sentences was determined by the different parts between them instead of the identical parts. Therefore, features in NTCIR-9 including sentence lengths, the content of matched keywords, quantities of matched keywords, and their parts of speech together with new features such as parsing tree information, dependency relations, negation words and synonyms were considered. We found that some features were useful for the BC subtask while some help more in the MC subtask.