Detecting session boundaries on the Web is important for several reasons. Firstly, it is important to establish a common context for various statistics relating to user sessions and frequency of user activities. More specifically, it is important to detect some boundaries in order to group related information together for other applications, such as learning techniques for adaptive search engines. To date, however, the notion of a session on the Web has not been consistently defined, if it at all. The tendency has been to group the log data that has been made available from one user or IP address under the umbrella of one session regardless of the length of time covered by the logs. This tendency lacks a more user oriented view. Our argument is that a session on the Web can be defined as a group of user activities that are related to each other not only through an evolving information need but also through close proximity in time. Thus, we describe and discuss the investigation based on two Web transaction logs (Excite and Altavista) with a view to structuring the activities into sessions or units for subsequent use in user-oriented learning techniques. The paper describes the methodology and the experiments performed followed by results and discussions. The results point to a 10-15 minute threshold between user activities for an appropriate session interval. The implications and limitations of the results as well as differences with traditional IR systems are also discussed.
[1]
James E. Pitkow,et al.
Characterizing Browsing Strategies in the World-Wide Web
,
1995,
Comput. Networks ISDN Syst..
[2]
Birger Hjørland,et al.
The Concept of 'subject' in Information Science
,
1992,
J. Documentation.
[3]
Amanda Spink,et al.
Searching heterogeneous collections on the Web: behaviour of Excite users
,
1998,
Inf. Res..
[4]
Stephen E. Robertson,et al.
On the Evaluation of IR Systems
,
1992,
Inf. Process. Manag..
[5]
Amanda Spink,et al.
Real life information retrieval: a study of user queries on the Web
,
1998,
SIGF.
[6]
Sally Jo Cunningham,et al.
An Analysis of Usage of a Digital Library
,
1998,
ECDL.
[7]
T. L. McCluskey,et al.
Towards an Adaptive Information Retrieval System
,
1991,
ISMIS.