Mining Frequent Generalized Patterns for Web Personalization

In this paper we present FGP, an algorithm that combines the powers of an association rule mining algorithm (FP-Growth) and a generalized pattern mining algorithm (GPClose) in order to efficiently generate rules from transaction data. Our Frequent Generalized Pattern (FGP) algorithm considers that all items that appear in a set of transactions, belong to categories organized in a taxonomy. It takes as input the transaction database and the taxonomy of categories and produces generalized association rules that contain transaction items and/or item categories. This algorithm is particularly useful for personalizing web sites with continuously updated content, such as, blog aggregators, or news portals. In this context, the transaction database contains user click-stream information and the hierarchy of item types is a thematic taxonomy of web pages. The algorithm generates frequent itemsets comprising of both web pages and categories. The results are used to generate association rules and consequently recommendations for the users. We experimentally evaluate the proposed algorithm using web log data collected from a newspaper web site.

[1]  Andreas Hotho,et al.  Conceptual User Tracking , 2003, AWIC.

[2]  Grigorios Tsoumakas,et al.  PersoNews: A Personalized News Reader Enhanced by Machine Learning and Semantic Filtering , 2006, OTM Conferences.

[3]  Ioannis Antonellis,et al.  Personalized News Categorization Through Scalable Text Classification , 2006, APWeb.

[4]  Iraklis Varlamis,et al.  Word Sense Disambiguation with Semantic Networks , 2008, TSD.

[5]  Stuart E. Middleton,et al.  Ontological user profiling in recommender systems , 2004, TOIS.

[6]  Susan T. Dumais,et al.  Newsjunkie: providing personalized newsfeeds via analysis of information novelty , 2004, WWW '04.

[7]  Ke Wang,et al.  Mining Generalized Associations of Semantic Relations from Textual Web Content , 2007, IEEE Transactions on Knowledge and Data Engineering.

[8]  Tao Jiang,et al.  Mining RDF Metadata for Generalized Association Rules , 2006, DEXA.

[9]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[10]  B. Mobasher 3 Data Mining for Web Personalization , 2007 .

[11]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[12]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[13]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[14]  Anh Duc Duong,et al.  Addressing cold-start problem in recommendation systems , 2008, ICUIMC '08.

[15]  Iraklis Varlamis,et al.  SEWeP: using site semantics and a taxonomy to enhance the Web personalization process , 2003, KDD '03.