Data Mining for Web Personalization

In this chapter we present an overview of Web personalization process viewed as an application of data mining requiring support for all the phases of a typical data mining cycle. These phases include data collection and pre-processing, pattern discovery and evaluation, and finally applying the discovered knowledge in real-time to mediate between the user and the Web. This view of the personalization process provides added flexibility in leveraging multiple data sources and in effectively using the discovered models in an automatic personalization system. The chapter provides a detailed discussion of a host of activities and techniques used at different stages of this cycle, including the preprocessing and integration of data from multiple sources, as well as pattern discovery techniques that are typically applied to this data. We consider a number of classes of data mining algorithms used particularly forWeb personalization, including techniques based on clustering, association rule discovery, sequential pattern mining, Markov models, and probabilistic mixture and hidden (latent) variable models. Finally, we discuss hybrid data mining frameworks that leverage data from a variety of channels to provide more effective personalization solutions.

[1]  Vibhu O. Mittal,et al.  Assistive Technology and Artificial Intelligence , 1998, Lecture Notes in Computer Science.

[2]  Michael J. Pazzani,et al.  Learning Collaborative Information Filters , 1998, ICML.

[3]  Dunja Mladenic,et al.  Text-learning and related intelligent agents: a survey , 1999, IEEE Intell. Syst..

[4]  Shichao Zhang,et al.  Product hierarchy-based customer profiles for electronic commerce recommendation , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[5]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[6]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[7]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[8]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Paolo Avesani,et al.  Trust-Aware Collaborative Filtering for Recommender Systems , 2004, CoopIS/DOA/ODBASE.

[10]  Anupam Joshi,et al.  Extracting Web User Profiles Using Relational Competitive Fuzzy Clustering , 2000, Int. J. Artif. Intell. Tools.

[11]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[12]  Myra Spiliopoulou,et al.  Analysis of navigation behaviour in web sites integrating multiple information systems , 2000, The VLDB Journal.

[13]  Bamshad Mobasher,et al.  Intelligent Techniques for Web Personalization , 2005, Lecture Notes in Computer Science.

[14]  Mark Levene,et al.  Data Mining of User Navigation Patterns , 1999, WEBKDD.

[15]  John Riedl,et al.  Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[16]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[17]  Brian Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[18]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[19]  Sergio A. Alvarez,et al.  Efficient Adaptive-Support Association Rule Mining for Recommender Systems , 2004, Data Mining and Knowledge Discovery.

[20]  Myra Spiliopoulou,et al.  A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis , 2003, INFORMS J. Comput..

[21]  Brigitte Trousse,et al.  Advanced data preprocessing for intersites Web usage mining , 2004, IEEE Intelligent Systems.

[22]  Ivan Koychev,et al.  Learning about Users from Observation , 2000 .

[23]  Robin D. Burke,et al.  Hybrid Web Recommender Systems , 2007, The Adaptive Web.

[24]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[25]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[26]  Andreas Hotho,et al.  Personalized Information Access in a Bibliographic Peer-to-Peer System , 2006, Semantic Web and Peer-to-Peer.

[27]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[28]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[29]  Themistoklis Palpanas,et al.  Web prefetching using partial match prediction , 1998 .

[30]  Tao Luo,et al.  Effective personalization based on association rule discovery from web usage data , 2001, WIDM '01.

[31]  Wei-Ying Ma,et al.  Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes , 2002, UAI.

[32]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[33]  Maurice D. Mulvenna,et al.  Discovering Internet marketing intelligence through online analytical web usage mining , 1998, SGMD.

[34]  Iraklis Varlamis,et al.  SEWeP: using site semantics and a taxonomy to enhance the Web personalization process , 2003, KDD '03.

[35]  Nematollaah Shiri,et al.  Improving the effectiveness of model based recommender systems for highly sparse and noisy Web usage data , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[36]  Matthias Baumgarten,et al.  User-Driven Navigation Pattern Discovery from Internet Data , 1999, WEBKDD.

[37]  Oren Etzioni,et al.  Adaptive Web sites , 2000, CACM.

[38]  Andreas Hotho,et al.  Conceptual User Tracking , 2003, AWIC.

[39]  Andrew E. Fano,et al.  Building Recommender Systems using a Knowledge Base of Product Semantics , 2002 .

[40]  Petra Benkovská,et al.  Web Usage Mining , 2009, Encyclopedia of Database Systems.

[41]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[42]  Rajesh Parekh,et al.  Lessons and Challenges from Mining Retail E-Commerce Data , 2004, Machine Learning.

[43]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[44]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[45]  George Karypis,et al.  Selective Markov models for predicting Web page accesses , 2004, TOIT.

[46]  Dean P. Foster,et al.  Clustering Methods for Collaborative Filtering , 1998, AAAI 1998.

[47]  Alfred Kobsa,et al.  The Adaptive Web, Methods and Strategies of Web Personalization , 2007, The Adaptive Web.

[48]  Kirsten Swearingen,et al.  Beyond Algorithms: An HCI Perspective on Recommender Systems , 2001 .

[49]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.

[50]  Xin Jin,et al.  Task-Oriented Web User Modeling for Recommendation , 2005, User Modeling.

[51]  Sarabjot Singh Anand,et al.  Employing a domain ontology to gain insights into user behaviour , 2005 .

[52]  Oren Etzioni,et al.  Adaptive Web Sites: Automatically Synthesizing Web Pages , 1998, AAAI/IAAI.

[53]  Myra Spiliopoulou,et al.  The Impact of Site Structure and User Environment on Session Reconstruction in Web Usage Analysis , 2002, WEBKDD.

[54]  Benjamin M. Marlin,et al.  Modeling User Rating Profiles For Collaborative Filtering , 2003, NIPS.

[55]  Jaideep Srivastava,et al.  WEBKDD 2002 - Mining Web Data for Discovering Usage Patterns and Profiles , 2003, Lecture Notes in Computer Science.

[56]  Tom Heskes,et al.  Categorization of web pages and user clustering with mixtures of hidden Markov models , 2008, KDD 2008.

[57]  Xin Jin,et al.  Web usage mining based on probabilistic latent semantic analysis , 2004, KDD.

[58]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[59]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[60]  Bamshad Mobasher,et al.  A Road Map to More Effective Web Personalization: Integrating Domain Knowledge with Web Usage Mining , 2003, International Conference on Internet Computing.

[61]  Philip S. Yu,et al.  A new method for similarity indexing of market basket data , 1999, SIGMOD '99.

[62]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[63]  Padhraic Smyth,et al.  Probabilistic Model-Based Clustering of Multivariate and Sequential Data , 1999 .

[64]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[65]  Thomas Reinartz,et al.  CRISP-DM 1.0: Step-by-step data mining guide , 2000 .

[66]  David M. Pennock,et al.  Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments , 2001, UAI.

[67]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[68]  Ata Kabán,et al.  On an equivalence between PLSI and LDA , 2003, SIGIR.

[69]  Balaji Padmanabhan,et al.  Unexpectedness as a Measure of Interestingness in Knowledge Discovery , 1999, Decis. Support Syst..

[70]  Alan F. Smeaton,et al.  The físchlár digital video system: a digital library of broadcast TV programmes , 2001, JCDL '01.

[71]  Thorsten Brants,et al.  Topic-based document segmentation with probabilistic latent semantic analysis , 2002, CIKM '02.

[72]  Neil J. Hurley,et al.  Collaborative recommendation: A robustness analysis , 2004, TOIT.

[73]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[74]  Rashmi R. Sinha,et al.  Comparing Recommendations Made by Online Systems and Friends , 2001, DELOS.

[75]  William W. Cohen,et al.  Recommendation as Classification: Using Social and Content-Based Information in Recommendation , 1998, AAAI/IAAI.

[76]  Prem Melville and Raymond J. Mooney and Ramadass Nagarajan Content-Boosted Collaborative Filtering , 2001 .

[77]  Padhraic Smyth,et al.  Predictive Profiles for Transaction Data using Finite Mixture Models , 2001 .

[78]  George Karypis,et al.  Evaluation of Item-Based Top-N Recommendation Algorithms , 2001, CIKM '01.

[79]  Anupam Joshi,et al.  Automatic Web User Profiling and Personalization Using Robust Fuzzy Relational Clustering , 2002 .

[80]  Michael D. Smith,et al.  Using Path Profiles to Predict HTTP Requests , 1998, Comput. Networks.

[81]  Kristian J. Hammond,et al.  Mining navigation history for recommendation , 2000, IUI '00.

[82]  Stuart E. Middleton,et al.  Ontological user profiling in recommender systems , 2004, TOIS.

[83]  Mark Claypool,et al.  Combining Content-Based and Collaborative Filters in an Online Newspaper , 1999, SIGIR 1999.

[84]  Tao Luo,et al.  Using sequential and non-sequential patterns in predictive Web usage mining tasks , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[85]  Vipin Kumar,et al.  Hypergraph Based Clustering in High-Dimensional Data Sets: A Summary of Results , 1998, IEEE Data Eng. Bull..

[86]  John Riedl,et al.  Application of Dimensionality Reduction in Recommender System - A Case Study , 2000 .

[87]  D. Bartholomew Latent Variable Models And Factor Analysis , 1987 .

[88]  Bamshad Mobasher,et al.  A Unified Approach to Personalization Based on Probabilistic Latent Semantic Models of Web Usage and Content , 2004 .

[89]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[90]  Xin Jin,et al.  A Recommendation Model Based on Latent Principal Factors in Web Navigation Data , 2004, WebDyn@WWW.

[91]  John F. Canny,et al.  Collaborative filtering with privacy via factor analysis , 2002, SIGIR '02.

[92]  Byoung-Tak Zhang,et al.  An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition , 2003, PAKDD.

[93]  Susan Gauch,et al.  Improving Ontology-Based User Profiles , 2004, RIAO.

[94]  Tom Heskes,et al.  Automatic Categorization of Web Pages and User Clustering with Mixtures of Hidden Markov Models , 2002, WEBKDD.

[95]  Alessandro Micarelli,et al.  Web Document Modeling , 2007, The Adaptive Web.

[96]  Michael J. Pazzani,et al.  Learning and Revising User Profiles: The Identification of Interesting Web Sites , 1997, Machine Learning.

[97]  Pedro M. Domingos,et al.  Adaptive Web Navigation for Wireless Devices , 2001, IJCAI.

[98]  Peter Pirolli,et al.  Distributions of surfers' paths through the World Wide Web: Empirical characterizations , 1999, World Wide Web.

[99]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[100]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[101]  Bruce Krulwich,et al.  Learning user information interests through extraction of semantically significant phrases , 1996 .

[102]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[103]  Michael J. Pazzani,et al.  Content-Based Recommendation Systems , 2007, The Adaptive Web.

[104]  Maurice Mulvenna,et al.  Personalization on the Net using Web Mining , 2000 .

[105]  Dmitry Pavlov,et al.  Sequence modeling with mixtures of conditional maximum entropy distributions , 2003, Third IEEE International Conference on Data Mining.

[106]  Myra Spiliopoulou,et al.  Web Usage Analysis and User Profiling , 2002, Lecture Notes in Computer Science.

[107]  Ron Kohavi,et al.  Applications of Data Mining to Electronic Commerce , 2000, Springer US.

[108]  Maurice D. Mulvenna,et al.  Personalization on the Net using Web mining: introduction , 2000, CACM.

[109]  Myra Spiliopoulou,et al.  WUM: A tool for Web Utilization analysis , 1999 .

[110]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[111]  Arindam Banerjee,et al.  Clickstream clustering using weighted longest common subsequences , 2001 .

[112]  Jonathan L. Herlocker,et al.  Clustering items for collaborative filtering , 1999 .

[113]  Joydeep Ghosh,et al.  Relationship-Based Clustering and Visualization for High-Dimensional Data Mining , 2003, INFORMS J. Comput..

[114]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[115]  Lillian N. Cassel,et al.  Client Side Personalization , 2001, DELOS.

[116]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[117]  Joseph A. Konstan,et al.  Introduction to recommender systems: Algorithms and Evaluation , 2004, TOIS.

[118]  Padhraic Smyth,et al.  Model-Based Clustering and Visualization of Navigation Patterns on a Web Site , 2003, Data Mining and Knowledge Discovery.

[119]  Robin D. Burke,et al.  Hybrid Systems for Personalized Recommendations , 2003, ITWP.

[120]  Henry Lieberman,et al.  Letizia: An Agent That Assists Web Browsing , 1995, IJCAI.

[121]  Vipin Kumar,et al.  Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.

[122]  Bamshad Mobasher,et al.  A Hybrid Web Personalization Model Based on Site Connectivity , 2003 .

[123]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[124]  Robin Burke,et al.  Inferring User’s Information Context from User Profiles and Concept Hierarchies , 2004 .

[125]  Tao Luo,et al.  Integrating Web Usage and Content Mining for More Effective Personalization , 2000, EC-Web.

[126]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[127]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[128]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[129]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[130]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[131]  Philip S. Yu Data mining and personalization technologies , 1999, Proceedings. 6th International Conference on Advanced Systems for Advanced Applications.

[132]  John Riedl,et al.  Shilling recommender systems for fun and profit , 2004, WWW '04.

[133]  Anupam Joshi,et al.  On Mining Web Access Logs , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[134]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[135]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[136]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[137]  Rashmi R. Sinha,et al.  The role of transparency in recommender systems , 2002, CHI Extended Abstracts.

[138]  CroftW. Bruce,et al.  Information filtering and information retrieval , 1992 .

[139]  Myra Spiliopoulou,et al.  WUM - A Tool for WWW Ulitization Analysis , 1998, WebDB.

[140]  Bamshad Mobasher,et al.  Improving the Effectiveness of Collaborative Filtering on Anonymous Web Usage Data , 2001 .

[141]  Arnd Kohrs,et al.  Clustering for collaborative filtering applications , 1999 .

[142]  Bamshad Mobasher,et al.  An Adaptive Agent for Web Exploration Based on Concept Hierarchies , 2001 .

[143]  Michael J. Pazzani,et al.  A Framework for Collaborative, Content-Based and Demographic Filtering , 1999, Artificial Intelligence Review.

[144]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[145]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[146]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[147]  Tao Luo,et al.  Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization , 2004, Data Mining and Knowledge Discovery.

[148]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[149]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[150]  Bamshad Mobasher,et al.  Web Usage Mining and Personalization , 2004, The Practical Handbook of Internet Computing.

[151]  Bruce Krulwich,et al.  LIFESTYLE FINDER: Intelligent User Profiling Using Large-Scale Demographic Data , 1997, AI Mag..

[152]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[153]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[154]  Alessandro Micarelli,et al.  User Profiles for Personalized Information Access , 2007, The Adaptive Web.