Identifying Semantic-Related Search Tasks in Query Log

Users often submit multiple related queries in order to accomplish one search task. Identifying search tasks faces two challenges: 1) Search tasks are often intertwined and may span from seconds to days. 2) Queries triggered by semantic-related search tasks may share few common terms or clicked documents. To address the challenges, we exploit semantic features of named entities to improve semantic-related search tasks identification. A novel approach to learning the semantic-related distance function between pair-wise queries is proposed. The approach uses categories of named entities as regularization, which reinforces that queries containing entities from the same category more probably belong to one search task. Finally, semantic-related search tasks are identified by the hierarchical agglomerative clustering algorithm with the learned distance function. Experiments show significant improvement of our approach over corresponding state-of-the-art ones.