Cut-off time calculation for user session identification by reference length

One of the methods of web log mining is also discovering patterns of behavior of web site visitors. Based on the found users' behavior patterns that are represented by sequence rules, it is possible to modify and improve web site of the organization. Data for the analysis are gained from the web server log file. These anonymous data represent the problem of unique identification of the web site visitor. The paper deals with less commonly used navigation-driven methods of user session identification. These methods assume that the user goes over several navigation pages during her/his visit until she/he finds the content page with required information. The content page is a page where the user spends considerably more time in comparison with navigation pages. The content page is considered to be the end of the session. Searching of the next content page using navigation pages constitutes a new user session. The division of pages into content and navigation pages is based on the calculation of cut-off time C. The verification of exponential distribution of variable that represents the time which user spent on the particular page is coessential. We prepared an experiment with data gained from log file of university web server. We tried to verify, if the time spent on web pages has exponential distribution and we estimated the value of cut-off time. The found results confirm our assumptions that the navigation oriented methods could be used to proper user session identification.

[1]  Jaideep Srivastava,et al.  A Novel Technique for Sessions Identification in Web Usage Mining Preprocessing , 2011 .

[2]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[3]  Georgios Paliouras,et al.  Web Usage Mining as a Tool for Personalization: A Survey , 2003, User Modeling and User-Adapted Interaction.

[4]  Jaideep Srivastava,et al.  Automatic personalization based on Web usage mining , 2000, CACM.

[5]  Nikolaos Avouris,et al.  A Survey of Web-Usage Mining: Techniques for Building Web-Based Adaptive Hypermedia Systems , 2005 .

[6]  Arumugam Gurusamy,et al.  Optimal Algorithms for Generation of User Session Sequences Using Server Side Web User Logs , 2009, 2009 International Conference on Network and Service Security.

[7]  Zhixiang Chen,et al.  Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs , 2002, PAKDD.

[8]  Liyan Zhuang,et al.  New Path Filling Method on Data Preprocessing in Web Mining , 2008, Comput. Inf. Sci..

[9]  Fionn Murtagh,et al.  Towards knowledge discovery from WWW log data , 2000, Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540).

[10]  Chhavi Rana,et al.  A Study of Web Usage Mining Research Tools , 2012 .

[11]  Yan Li,et al.  Research on Path Completion Technique in Web Usage Mining , 2008, 2008 International Symposium on Computer Science and Computational Technology.

[12]  Martin Drlík,et al.  Influence of Different Session Timeouts Thresholds on Results of Sequence Rule Analysis in Educational Data Mining , 2011, DICTAP.

[13]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[14]  Selvadoss Thanamani Dr.Antony A Novel Technique for Sessions Identification in Web Usage Mining Preprocessing , 2011 .

[15]  Myra Spiliopoulou,et al.  Analysis of navigation behaviour in web sites integrating multiple information systems , 2000, The VLDB Journal.

[16]  Pier Luca Lanzi,et al.  Recent Developments in Web Usage Mining Research , 2003, DaWaK.

[17]  Peter Svec,et al.  Data advance preparation factors affecting results of sequence rule analysis in web log mining , 2010 .

[18]  Philip S. Yu,et al.  Data mining for path traversal patterns in a web environment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[19]  Shrivastava Anurag,et al.  A Survey on Requirements Engineering Process Maturity Assessment and Improvement Model , 2011 .

[20]  Pier Luca Lanzi,et al.  Mining interesting knowledge from weblogs: a survey , 2005, Data Knowl. Eng..