论文信息 - Extracting Features for Machine Learning in NTCIR-10 RITE Task

Extracting Features for Machine Learning in NTCIR-10 RITE Task

NTCIR-9 RITE task evaluates systems which automatically detect entailment, paraphrase, and contradiction in texts. We developed a preliminary system for the NTCIR-9 RITE task based on rules. In NTCIR-10, we tried machine learning approaches. We transformed the existing rules into features and then added additional syntactic and semantic features for SVM. The straightforward assumption was still kept in NTCIR-10: the relation between two sentences was determined by the different parts between them instead of the identical parts. Therefore, features in NTCIR-9 including sentence lengths, the content of matched keywords, quantities of matched keywords, and their parts of speech together with new features such as parsing tree information, dependency relations, negation words and synonyms were considered. We found that some features were useful for the BC subtask while some help more in the MC subtask.

Lun-Wei Ku | Edward T.-H. Chu | Nai-Hsuan Han

[1] Daniel Jurafsky,et al. Discriminative Reordering with Chinese Grammatical Relations Features , 2009, SSST@HLT-NAACL.

[2] Dekai Wu. Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation , 2010, SSST@COLING.

[3] Roger Levy,et al. Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.

[4] Shuming Shi,et al. Overview of NTCIR-9 RITE: Recognizing Inference in TExt , 2011, NTCIR.