A fine grained heuristic to capture web navigation patterns

In previous work we have proposed a statistical model to capture the user behaviour when browsing the web. The user navigation information obtained from web logs is modelled as a hypertext probabilistic grammar (HPG) which is within the class of regular probabilistic grammars. The set of highest probability strings generated by the grammar corresponds to the user preferred navigation trails. We have previously conducted experiments with a Breadth-First Search algorithm (BFS) to perform the exhaustive computation of all the strings with probability above a specified cut-point, which we call the rules. Although the algorithm’s running time varies linearly with the number of grammar states, it has the drawbacks of returning a large number of rules when the cut-point is small and a small set of very short rules when the cut-point is high. In this work, we present a new heuristic that implements an iterative deepening search wherein the set of rules is incrementally augmented by first exploring trails with high probability. A stopping parameter is provided which measures the distance between the current rule-set and its corresponding maximal set obtained by the BFS algorithm. When the stopping parameter takes the value zero the heuristic corresponds to the BFS algorithm and as the parameter takes values closer to one the number of rules obtained decreases accordingly. Experiments were conducted with both real and synthetic data and the results show that for a given cut-point the number of rules induced increases smoothly with the decrease of the stopping criterion. Therefore, by setting the value of the stopping criterion the analyst can determine the number and quality of rules to be induced; the quality of a rule is measured by both its length and probability.

[1]  Mark Levene,et al.  Data Mining of User Navigation Patterns , 1999, WEBKDD.

[2]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[3]  C. S. Wetherell,et al.  Probabilistic Languages: A Review and Some Open Questions , 1980, CSUR.

[4]  Padhraic Smyth,et al.  A General Probabilistic Framework for Clustering Individuals , 2000, KDD 2000.

[5]  Padhraic Smyth,et al.  A general probabilistic framework for clustering individuals and objects , 2000, KDD '00.

[6]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[7]  Mark Levene,et al.  An Heuristic to Capture Longer User Web Navigation Patterns , 2000, EC-Web.

[8]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[9]  Umeshwar Dayal,et al.  From User Access Patterns to Dynamic Hypertext Linking , 1996, Comput. Networks.

[10]  Jakob Nielsen,et al.  The art of navigating through hypertext , 1990, CACM.

[11]  Mark Levene,et al.  A Probabilistic Approach to Navigation in Hypertext , 1999, Inf. Sci..

[12]  Rick Stout Web Site Stats: Tracking Hits and Analyzing Web Traffic , 1996 .

[13]  Oren Etzioni,et al.  Adaptive Web Sites: an AI Challenge , 1997, IJCAI.

[14]  Jeff Conklin,et al.  Hypertext: An Introduction and Survey , 1987, Computer.

[15]  Michael D. Smith,et al.  Using Path Profiles to Predict HTTP Requests , 1998, Comput. Networks.

[16]  R SarukkaiRamesh Link prediction and path analysis using Markov chains , 2000 .

[17]  Padhraic Smyth,et al.  Visualization of navigation patterns on a Web site using model-based clustering , 2000, KDD '00.

[18]  Myra Spiliopoulou,et al.  WUM: A tool for Web Utilization analysis , 1999 .

[19]  Mark Levene,et al.  Constructing Web Views from Automated Navigation Sessions , 1999, WOWS.

[20]  Chris Chatfield,et al.  Statistical Inference Regarding Markov Chain Models , 1973 .

[21]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[22]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[23]  Myra Spiliopoulou,et al.  WUM - A Tool for WWW Ulitization Analysis , 1998, WebDB.

[24]  Mark Levene,et al.  Mining Association Rules in Hypertext Databases , 1998, KDD.

[25]  Oren Etzioni,et al.  Adaptive Web Sites: Automatically Synthesizing Web Pages , 1998, AAAI/IAAI.