ICRCS at Intent2: Applying Rough Set and Semantic Relevance for Subtopic Mining

The target of the subtopic mining subtask of NTCIR-10 Intent-2 Task is to return a ranked list of subtopics. To this end, this paper proposes a method to apply the rough set theory for redundancy reduction in subtopic mined from webpages. Besides, semantic similarity is used for subtopic relevance measure in the re-ranking process, computed with semantic features extracted by NLP tools and semantic dictionary. By using the reduction concept of rough set, we first construct rough set based model (RSBM) for subtopic mining. Next, we combine the rough set theory and semantic relevance into a new model (RS&SRM). Evaluation results show the effectiveness of our approach compared with a baseline frequency term based model (FTBM). The best performance is achieved by RS&SRM, with I-rec of 0.4046, D-nDCG of 0.4413 and D#-nDCG of 0.4229 on the subtask of Chinese subtopic mining.

[1]  Yiyu Yao,et al.  Attribute reduction in decision-theoretic rough set models , 2008, Inf. Sci..

[2]  Stephen E. Robertson,et al.  Ambiguous requests: implications for retrieval tests, systems and theories , 2007, SIGF.

[3]  Chen Li,et al.  A parallel rough set attribute reduction algorithm based on attribute frequency , 2012, 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery.

[4]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[5]  Kin Keung Lai,et al.  Variable precision rough set for group decision-making: An application , 2008, Int. J. Approx. Reason..

[6]  Dieter Merkl,et al.  Text classification with self-organizing maps: Some lessons learned , 1998, Neurocomputing.

[7]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  Hsin-Hsi Chen,et al.  Mining subtopics from different aspects for diversifying search results , 2012, Information Retrieval.