Web Usage Mining as a Tool for Personalization: A Survey

This paper is a survey of recent work in the field of web usage mining for the benefitof research on the personalization of Web-based information services. The essence of personalization is the adaptability of information systems to the needs of their users. This issue is becoming increasingly important on the Web, as non-expert users are overwhelmed by the quantity of information available online, while commercial Web sites strive to add value to their services in order to create loyal relationships with their visitors-customers. This article views Web personalization through the prism of personalization policies adopted by Web sites and implementing a variety of functions. In this context, the area of Web usage mining is a valuable source of ideas and methods for the implementation of personalization functionality. We therefore present a survey of the most recent work in the field of Web usage mining, focusing on the problemsthat have been identified and the solutions that have been proposed.

[1]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[2]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[3]  Mark Levene,et al.  Data Mining of User Navigation Patterns , 1999, WEBKDD.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[6]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[7]  Eric Schwarzkopf An Adaptive Web Site for the UM2001 Conference , 2001 .

[8]  Jaideep Srivastava,et al.  Discovery of Interesting Usage Patterns from Web Data , 1999, WEBKDD.

[9]  Constantine D. Spyropoulos,et al.  Exploiting learning techniques for the acquisition of user stereotypes and communities , 1999 .

[10]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[11]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[12]  Pat Langley,et al.  User modeling in adaptive interfaces , 1999 .

[13]  Jiawei Han,et al.  Knowledge Discovery in Databases: An Attribute-Oriented Approach , 1992, VLDB.

[14]  Anupam Joshi,et al.  On Mining Web Access Logs , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[15]  Vipin Kumar,et al.  Clustering Based On Association Rule Hypergraphs , 1997, DMKD.

[16]  James E. Pitkow,et al.  In Search of Reliable Usage Data on the WWW , 1997, Comput. Networks.

[17]  Vladimir Estivill-Castro,et al.  Why so many clustering algorithms: a position paper , 2002, SKDD.

[18]  Reflection Coefficients,et al.  A. real-time , 1982 .

[19]  Maurice D. Mulvenna,et al.  Discovering Internet marketing intelligence through online analytical web usage mining , 1998, SGMD.

[20]  Gerald Salton,et al.  Automatic text processing , 1988 .

[21]  T. Lotan,et al.  Modeling discrete choice behavior based on explicit information integration and its application to the route choice problem , 1998, IEEE Trans. Syst. Man Cybern. Part A.

[22]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[23]  Jianhan Zhu Using Markov Chains for Structural Link Prediction in Adaptive Web Sites , 2001, User Modeling.

[24]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[25]  David Heckerman,et al.  Dependency Networks for Density Estimation, Collaborative Filtering, and Data Visualization , 2000 .

[26]  Alfred Kobsa User Modeling and User-Adapted Interaction , 2005, User Modeling and User-Adapted Interaction.

[27]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[28]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[29]  Ingrid Zukerman,et al.  Predicting users' requests on the WWW , 1999 .

[30]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[31]  Farnoush Banaei-Kashani,et al.  A Reliable, Efficient, and Scalable System for Web Usage Data Acquisition , 2001 .

[32]  PatternsYongjian,et al.  Clustering of Web Users Based on Access , 1999 .

[33]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[34]  Philip S. Yu,et al.  SpeedTracer: A Web Usage Mining and Analysis Tool , 1998, IBM Syst. J..

[35]  Liliana Ardissono,et al.  Dynamic User Modeling in a Web Store Shell , 2000, ECAI.

[36]  Alfred Kobsa,et al.  Performance Evaluation of User Modeling Servers under Real-World Workload Conditions , 2003, User Modeling.

[37]  Teuvo Kohonen,et al.  Self-Organizing Maps, Second Edition , 1997, Springer Series in Information Sciences.

[38]  Craig A. Knoblock,et al.  Wrapper generation for semi-structured Internet sources , 1997, SGMD.

[39]  Pedro M. Domingos,et al.  Personalizing web sites for mobile users , 2001, WWW '01.

[40]  Pedro M. Domingos,et al.  Adaptive Web Navigation for Wireless Devices , 2001, IJCAI.

[41]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[42]  Jaideep Srivastava,et al.  Web Mining: Pattern Discovery from World Wide Web Transactions , 1996 .

[43]  Alfred Kobsa,et al.  Personalised hypermedia presentation techniques for improving online customer relationships , 2001, The Knowledge Engineering Review.

[44]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[45]  Oren Etzioni,et al.  The World-Wide Web: quagmire or gold mine? , 1996, CACM.

[46]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[47]  Thomas Reinartz,et al.  CRISP-DM 1.0: Step-by-step data mining guide , 2000 .

[48]  K. M. Mehata,et al.  The Variable Precision Rough Set Model for Web Usage Mining , 2001, Web Intelligence.

[49]  Maurice Mulvenna,et al.  The ‘Soft-Push’: Mining Internet Data for Marketing Intelligence , 1997 .

[50]  Krishna Bharat,et al.  WEBVIZ: A Tool for World Wide Web Access Log Analysis , 1994 .

[51]  Georgios Paliouras,et al.  Large-Scale Mining of Usage Data on Web Sites , 2000 .

[52]  Tapan Kamdar,et al.  On Creating Adaptive Web Servers Using Weblog Mining , 2000 .

[53]  Eric Horvitz,et al.  Collaborative Filtering by Personality Diagnosis: A Hybrid Memory and Model-Based Approach , 2000, UAI.

[54]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[55]  Pattie Maes,et al.  Footprints: History-Rich Web Browsing , 1997, RIAO.

[56]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[57]  Oren Etzioni,et al.  Adaptive Web sites , 2000, CACM.

[58]  Continuous online extraction of HTTP traces from packet traces , 1998 .

[59]  Myra Spiliopoulou,et al.  Improving the Effectiveness of a Web Site with Web Usage Mining , 1999, WEBKDD.

[60]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[61]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[62]  Ivan Koychev,et al.  Gradual Forgetting for Adaptation to Concept Drift , 2000 .

[63]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[64]  Tom M. Mitchell,et al.  Experience with a learning personal assistant , 1994, CACM.

[65]  Gautam Biswas,et al.  ITERATE: a conceptual clustering algorithm for data mining , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[66]  Xindong Wu The HCV induction algorithm , 1993, CSC '93.

[67]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[68]  Jaideep Srivastava,et al.  Grouping Web page references into transactions for mining World Wide Web browsing patterns , 1997, Proceedings 1997 IEEE Knowledge and Data Engineering Exchange Workshop.

[69]  Soumen Chakrabarti,et al.  Data mining for hypertext: a tutorial survey , 2000, SKDD.

[70]  Myra Spiliopoulou,et al.  Data Mining for the Web , 1999, PKDD.

[71]  Peter Brusilovsky,et al.  Adaptive educational systems on the World Wide Web , 1998 .

[72]  Jaideep Srivastava,et al.  Automatic personalization based on Web usage mining , 2000, CACM.

[73]  Azer Bestavros,et al.  Using speculation to reduce server load and service time on the WWW , 1995, CIKM '95.

[74]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[75]  Xindong Wu,et al.  SiteHelper: A Localized Agent That Helps Incremental Exploration of the World Wide Web , 1997, Comput. Networks.

[76]  John Riedl,et al.  E-Commerce Recommendation Applications , 2004, Data Mining and Knowledge Discovery.

[77]  Carla Simone,et al.  A configurable system for the construction of adaptive virtual stores , 1999, World Wide Web.

[78]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[79]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[80]  Judy Kay,et al.  UM99 user modeling : proceedings of the Seventh International Conference, Banff, Canada, June 20-24, 1999 , 1999 .

[81]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[82]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[83]  Geoffrey I. Webb,et al.  # 2001 Kluwer Academic Publishers. Printed in the Netherlands. Machine Learning for User Modeling , 1999 .

[84]  Cyrus Shahabi,et al.  Knowledge discovery from users Web-page navigation , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[85]  Anupam Joshi,et al.  Warehousing and mining Web logs , 1999, WIDM '99.

[86]  Philip S. Yu,et al.  Data mining for path traversal patterns in a web environment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[87]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[88]  Alexander Pretschner,et al.  Personalization on the Web , 1999 .

[89]  Lukas C Faulstich Building Hyper View web sites , 1999 .

[90]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[91]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[92]  Alan J. Broder Data Mining, the Internet, and Privacy , 1999, WEBKDD.

[93]  Saul Greenberg,et al.  How people revisit web pages: empirical findings and implications for the design of history systems , 1997, Int. J. Hum. Comput. Stud..

[94]  Umeshwar Dayal,et al.  From User Access Patterns to Dynamic Hypertext Linking , 1996, Comput. Networks.

[95]  Alfred Kobsa,et al.  Tailoring Privacy to Users' Needs , 2001, User Modeling.

[96]  Vipin Kumar,et al.  Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.

[97]  Udi Manber,et al.  Experience with personalization of Yahoo! , 2000, CACM.

[98]  Machine Learning Meets Human-Computer Interaction - Introduction to the Special Issue , 1997, Appl. Artif. Intell..

[99]  Georgios Paliouras,et al.  Clustering the Users of Large Web Sites into Communities , 2000, ICML.

[100]  P. Tan,et al.  WebSIFT : The Web Site Information Filter , 1999 .

[101]  Tao Luo,et al.  Integrating Web Usage and Content Mining for More Effective Personalization , 2000, EC-Web.

[102]  Ibrahim Cingil,et al.  A broader approach to personalization , 2000, CACM.

[103]  Jaideep Srivastava,et al.  Creating adaptive Web sites through usage-based clustering of URLs , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[104]  B. Brown,et al.  Concepts and Techniques , 1983 .

[105]  Gregory F. Cooper,et al.  Learning Hybrid Bayesian Networks from Data , 1999, Learning in Graphical Models.

[106]  Oren Etzioni,et al.  Adaptive Web Sites: Automatically Synthesizing Web Pages , 1998, AAAI/IAAI.

[107]  Philip K. Chan,et al.  A Non-Invasive Learning Approach to Building Web User Profiles , 1999 .

[108]  Ingrid Zukerman,et al.  Pre-sending Documents on the WWW: A Comparative Study , 1999, IJCAI.

[109]  Dimitrios Koukopoulos,et al.  A REAL-TIME AUCTION SYSTEM OVER THE WWW , 1999 .

[110]  Padhraic Smyth,et al.  Visualization of navigation patterns on a Web site using model-based clustering , 2000, KDD '00.

[111]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[112]  Paul E. Utgoff,et al.  ID5: An Incremental ID3 , 1987, ML.

[113]  Anupam Joshi,et al.  Relational clustering based on a new robust estimator with application to Web mining , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).