Incremental click-stream tree model: Learning from new users for web page prediction

Predicting the next request of a user has gained importance as Web-based activity increases in order to guide Web users during their visits to Web sites. Previously proposed methods for recommendation use data collected over time in order to extract usage patterns. However, these patterns may change over time, because each day new log entries are added to the database and old entries are deleted. Thus, over time it is highly desirable to perform the update of the recommendation model incrementally. In this paper, we propose a new model for modeling and predicting Web user sessions which attempt to reduce the online recommendation time while retaining predictive accuracy. Since it is very easy to modify the model, it is updated during the recommendation process. The incremental algorithm yields a better prediction accuracy as well as a shorter online recommendation time. A performance evaluation of Incremental Click-Stream Tree model over two different Web server access logs indicate that the proposed incremental model yields significant speed-up of recommendation time and improvement of the prediction accuracy.

[1]  E. Frías-Martínez A Prediction Model for User Access Sequences , 2002 .

[2]  Bamshad Mobasher,et al.  A Hybrid Web Personalization Model Based on Site Connectivity , 2003 .

[3]  George Karypis,et al.  Selective Markov models for predicting Web page accesses , 2004, TOIT.

[4]  Anupam Joshi,et al.  Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator , 1999, WWW 1999.

[5]  Tao Luo,et al.  Integrating Web Usage and Content Mining for More Effective Personalization , 2000, EC-Web.

[6]  Peter Scheuermann,et al.  Proxy Cache Algorithms: Design, Implementation, and Performance , 1999, IEEE Trans. Knowl. Data Eng..

[7]  Sourav S. Bhowmick,et al.  Research Issues in Web Data Mining , 1999, DaWaK.

[8]  John Riedl,et al.  Application of Dimensionality Reduction in Recommender System - A Case Study , 2000 .

[9]  Cyrus Shahabi,et al.  Knowledge discovery from users Web-page navigation , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[10]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[11]  Bamshad Mobasher,et al.  Discovery of Aggregate Usage Profiles for Web Personalization , 2000 .

[12]  Oren Etzioni,et al.  The World-Wide Web: quagmire or gold mine? , 1996, CACM.

[13]  Jiang Zhu,et al.  Combining Web Usage and Content Mining for More Effective Personalization , 2000 .

[14]  Ayhan Demiriz,et al.  webSPADE: a parallel sequence mining algorithm to analyze web log data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[15]  Hu Yuqi,et al.  Caching on the World Wide Web , 2003 .

[16]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[17]  M. Tamer Özsu,et al.  A Web page prediction model based on click-stream tree representation of user behavior , 2003, KDD '03.

[18]  Osmar R. Zaïane,et al.  Clustering Web sessions by sequence alignment , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[19]  D. Szafron,et al.  Sequence Alignment using FastLSA , 2000 .

[20]  Tamer M. Özsu,et al.  A User Interest Model for Web Page Navigation , 2003 .

[21]  Bettina Berendt,et al.  Web Usage Mining, Site Semantics, and the Support of Navigation , 2000 .

[22]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[23]  Qiang Yang,et al.  Mining web logs for prediction models in WWW caching and prefetching , 2001, KDD '01.

[24]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[25]  Xiangji Huang,et al.  Discovery of interesting association rules from Livelink web log data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[26]  M. Tamer Özsu,et al.  A Poisson Model for User Accesses to Web Pages , 2003, ISCIS.

[27]  S.G. Oguducu,et al.  A new graph-based evolutionary approach to sequence clustering , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[28]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[29]  Ming Gu,et al.  Spectral min-max cut for graph partitioning and data clustering , 2001 .

[30]  Osmar R. Zaïane,et al.  Combining Usage, Content, and Structure Data to Improve Web Site Recommendation , 2004, EC-Web.

[31]  Bettina Berendt,et al.  Understanding web usage at different levels of abstraction: coarsening and visualizing sequences , 2001 .

[32]  Arindam Banerjee,et al.  Clickstream clustering using weighted longest common subsequences , 2001 .

[33]  David M. Pennock,et al.  REFEREE: An Open Framework for Practical Testing of Recommender Systems using ResearchIndex , 2002, Very Large Data Bases Conference.

[34]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[35]  Liu Zhijing,et al.  Web mining research , 2003, Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003.

[36]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[37]  Pedro M. Domingos,et al.  Relational Markov models and their application to adaptive web navigation , 2002, KDD.

[38]  Yannis Manolopoulos,et al.  . EFFECTIVE PREDICTION OF WEB-USER ACCESSES: A DATA MINING APPROACH , 2001 .

[39]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[40]  Paul Beynon-Davies Databases and the Web , 2004 .

[41]  A. Uyar,et al.  A Graph Based Clustering Method using a Hybrid Evolutionary Algorithm , 2004 .

[42]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[43]  Bamshad Mobasher,et al.  Using Ontologies to Discover Domain-Level Web Usage Profiles , 2002 .

[44]  Tao Luo,et al.  Effective personalization based on association rule discovery from web usage data , 2001, WIDM '01.

[45]  Myra Spiliopoulou,et al.  Analysis of navigation behaviour in web sites integrating multiple information systems , 2000, The VLDB Journal.