Towards a Novel Association Measure via Web Search Results Mining

Web-based association measure aims to evaluate the semantic similarity between two queries (i.e. words or entities) by leveraging the search results returned by search engines. Existing web-relevance similarity measure usually considers all search results for a query as a coarse-grained single topic and measures the similarity between the term vectors constructed by concatenating all search results into a single document for each query. This paper proposes a novel association measure named WSRCM based on web search results clustering and matching to evaluate the semantic similarity between two queries at a fine-grained level. WSRCM first discovers the subtopics in the search results for each query and then measures the consistency between the sets of subtopics for two queries. Each subtopic for a query is expected to describe a unique facet of the query, and two queries sharing more subtopics are deemed more semantically related. Experimental results demonstrate the encouraging performance of the proposed measure.