Investigation of partial query proximity in web search

Proximity of query terms in a document is an important criterion in IR. However, no investigation has been made to determine the most useful term sequences for which proximity should be considered. In this study, we test the effectiveness of using proximity of partial term sequences (n-grams) for Web search. We observe that the proximity of sequences of 3 to 5 terms is most effective for long queries, while shorter or longer sequences appear less useful. This suggests that combinations of 3 to 5 terms can best capture the intention in user queries. In addition, we also experiment with weighing the importance of query sub-sequences using query log frequencies. Our preliminary tests show promising empirical results.