Variable Length Markov Chains for Web Usage Mining

Web usage mining is usually defined as the discipline that concentrates on developing techniques that model and study users’ Web navigation behavior by means of analyzing data obtained from user interactions with Web resources; see (Mobasher, 2006; Liu, 2007) for recent reviews on web usage mining. When users access Web resources they leave a trace behind that is stored in log files, such traces are called clickstream records. Clickstream records can be preprocessed into time-ordered sessions of sequential clicks (Spiliopoulou et al., 2003), where a user session represents a trail the user followed through the Web space. The process of session reconstruction is called sessionizing. Understanding user Web navigation behavior is a fundamental step in providing guidelines on how to improve users’ Web experience. In this context, a model able to represent usage data can be used to induce frequent navigation patterns, to predict future user navigation intentions, and to provide a platform for adapting Web pages according to user specific information needs (Anand et al., 2005; Eirinaki et al., 2007). Techniques using association rules (Herlocker et al., 2004) or clustering methods (Mobasher et al., 2002) have been used in this context. Given a set of transactions clustering techniques can be used, for example, to find user segments, and association rule techniques can be used, for example, to find important relationships among pages based on the users navigational patterns. These methods have the limitation that the ordering of page views is not taken into consideration in the modeling of user sessions (Liu, 2007). Two methods that take into account the page view ordering are: tree based methods (Chen et al., 2003) used for prefetching Web resources, and Markov models (Borges et al., 2000; Deshpande et al., 2004) used for link prediction. Moreover, recent studies have been conducted on the use of visualization techniques for discovering navigational trends from usage data (Chen et al., 2007a; Chen et al., 2007b).

[1]  Randy Goebel,et al.  Visualizing Web Navigation Data with Polygon Graphs , 2007, 2007 11th International Conference Information Visualization (IV '07).

[2]  Michalis Vazirgiannis,et al.  Web site personalization based on link analysis and navigational patterns , 2007, TOIT.

[3]  Tao Luo,et al.  Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization , 2004, Data Mining and Knowledge Discovery.

[4]  Ivan Koychev EXPERIMENTS WITH TWO APPROACHES FOR TRACKING DRIFTING CONCEPTS , 2007 .

[5]  Bamshad Mobasher,et al.  Intelligent Techniques for Web Personalization , 2005, Lecture Notes in Computer Science.

[6]  George Karypis,et al.  Selective Markov models for predicting Web page accesses , 2004, TOIT.

[7]  Mark Levene,et al.  Testing the Predictive Power of Variable History Web Usage , 2007, Soft Comput..

[8]  John Wang,et al.  Data Warehousing and Mining: Concepts, Methodologies, Tools, and Applications , 2008 .

[9]  Philip Calvert,et al.  Encyclopedia of Data Warehousing and Mining , 2006 .

[10]  Randy Goebel,et al.  Visual Data Mining of Web Navigational Data , 2007, 2007 11th International Conference Information Visualization (IV '07).

[11]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[12]  Gill Bejerano Algorithms for variable length Markov chain modeling , 2004, Bioinform..

[13]  Mark Levene,et al.  Data Mining of User Navigation Patterns , 1999, WEBKDD.

[14]  Xin Chen,et al.  A Popularity-Based Prediction Model for Web Prefetching , 2003, Computer.

[15]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[16]  Shen Jun-yi,et al.  A new Markov model for Web access prediction , 2002 .

[17]  Mark Levene,et al.  Evaluating Variable-Length Markov Chain Models for Analysis of User Web Navigation Sessions , 2007, IEEE Transactions on Knowledge and Data Engineering.

[18]  Honghua Dai,et al.  Inexact Field Learning Approach for Data Mining , 2009, Encyclopedia of Data Warehousing and Mining.

[19]  Yun Sing Koh,et al.  Finding Non-Coincidental Sporadic Rules Using Apriori-Inverse , 2006, Int. J. Data Warehous. Min..

[20]  Anupriya Ankolekar,et al.  The two cultures: mashing up web 2.0 and the semantic web , 2007, WWW '07.

[21]  Mark Levene,et al.  Generating Dynamic Higher-Order Markov Models in Web Usage Mining , 2005, PKDD.

[22]  Myra Spiliopoulou,et al.  A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis , 2003, INFORMS J. Comput..