Measuring the Semantic Relevance between Term and Short Text: Using the Concepts of Shortest Path Length and Relatively Important Community

Performance of Information Retrieval (IR) can be improved by query expansion (QE) [1-2]. Current research of Chinese QE focuses mainly on expanding single term, which assumes that the user query is a short and complete term. However, user query may be a complex query, i.e., a query expressed in natural language, such as "(lie ju quan qiu bian nuan de wei hai, List the damages resulting from global warming)". Little QE work focus on complex query and the prevalent approaches go like this: segment the original Chinese query Q into the vector Qa composed of multiple terms and expand each term in Qa respectively. These approaches ignore some valuable information, such as term combination and term concurrence in the complex query. Take the query for example, this query can be segmented as {(lie ju, list), (quan qiu, global), (bian nuan, warming), (wei hai, damage)}. Intuitively , "(quan qiu bian nuan, global warming)" expressed the users' intention better than "(quan qiu, global)" and "(wei hai, damage)"; moreover, the combined term "(global warming)" has turned into a semantic unit with more special connotation than single term "(quan qiu, global)" or "(bian nuan, warming)".