Analysis of Web Search Engine Query Sessions

In this paper we process and analyze web search engine query and click data from the perspective of the query session (query + clicked results) conducted by the user. We initially state some hypotheses for possible user types and quality profiles for the user session, based on descriptive variables of the session. The query dataset is preprocessed and analyzed using some traditional statistical methods, and then processed by the Kohonen SOM clustering technique, which we use to produce a two level clustering. The clusters are interpreted in terms of the user type and quality profiles defined initially. Then we apply the C4.5 rule induction algorithm to predict the session quality and the user type, using two month’s of click data for training, and testing on data captured during a third consecutive month. The objective of the work is to apply a systematic data mining process to click data, contrasting non­supervised (Kohonen) and supervised (C4.5) methods to cluster and model the data, in order to identify profiles and rules which relate to theoretical user behavior and user session “quality”.