Web search queries have evolved into a language of their own. In this paper, we substantiate this fact through the analysis of complex networks constructed from query logs. Like natural language, a two-regime degree distribution in word or phrase co-occurrence networks of queries reveals the existence of a small kernel and a very large periphery. But unlike natural language, where a large fraction of sentences are formed only using the kernel words, most queries consist of units both from the kernel and the periphery. The long mean shortest path for these networks further show that paths between peripheral units are typically connected through nodes in the kernel, which in turn are connected through multiple hops within the kernel. The extremely large periphery implies that the likelihood of encountering a new word or segment is much higher for queries than in natural language, making the processing of unseen queries much harder than that of unseen sentences.
[1]
Rishiraj Saha Roy,et al.
Unsupervised query segmentation using only query logs
,
2011,
WWW.
[2]
Animesh Mukherjee,et al.
The Structure and Dynamics of Linguistic Networks
,
2009
.
[3]
Matthias Hagen,et al.
Query segmentation revisited
,
2011,
WWW.
[4]
Duncan J. Watts,et al.
Collective dynamics of ‘small-world’ networks
,
1998,
Nature.
[5]
Ramon Ferrer i Cancho,et al.
The small world of human language
,
2001,
Proceedings of the Royal Society of London. Series B: Biological Sciences.