论文信息 - Cm-pmi: improved web-based association measure with contextual label matching

Cm-pmi: improved web-based association measure with contextual label matching

WebPMI is a popular web-based association measure to evaluate the semantic similarity between two queries (i.e. words or entities) by leveraging search results returned by search engines. This paper proposes a novel measure named CM-PMI to evaluate query similarity at a finer granularity than WebPMI, under the assumption that a query is usually associated with more than one aspect and two queries are deemed semantically related if their associated aspect sets are highly consistent with each other. CM-PMI first extracts contextual labels from search results to represent the aspects of a query, and then uses the optimal matching method to assess the consistency between the aspects of two queries. Experimental results on the benchmark Miller Charles' dataset demonstrate the good effectiveness of the proposed CM-PMI measure. Moreover, we further fuse WebPMI and CM-PMI to obtain improved results.

Xiaojun Wan

[1] Mehran Sahami,et al. A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[2] Peter D. Turney. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[3] Harold W. Kuhn,et al. The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[4] G. Miller,et al. Contextual correlates of semantic similarity , 1991 .

[5] Hsin-Hsi Chen,et al. Novel Association Measures Using Web Search with Double Checking , 2006, ACL.

[6] Danushka Bollegala,et al. Measuring semantic similarity between words using web search engines , 2007, WWW '07.