GWUM : une généralisation des pages Web guidée par les usages

The usage analysis of a Web site based on the extracted sequential patterns is often limited by the low support of these patterns. That is mainly due to the great diversity of the pages and behaviors. However, it is possible to group the majority of these pages in various categories during a preprocessing. Then, using these categories, rather than the URL, will allow us to discover "generic" behaviors. This article presents a methodology for Web usage mining that uses such a generalization of the URL. This generalization is based on a categorization of the URL using the information extracted from the Web users’ accesses to these pages. Then, we present an experiment which shows how the support of the extracted sequential patterns changes according to whether the patterns are obtained with or without this generalization. MOTS-CLÉS : Fouille des usage du Web, motifs séquentiels, classification

[1]  Doru Tanasa,et al.  Web Usage Mining: Contributions to Intersites Logs Preprocessing and Sequential Pattern Extraction with Low Support , 2005 .

[2]  Hye-Chung Kum,et al.  Approximate mining of consensus sequential patterns , 2004 .

[3]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[4]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[5]  Bamshad Mobasher,et al.  Impact of Site Characteristics on Recommendation Models Based On Association Rules and Sequential Patterns , 2003 .

[6]  WebUsersMyra Spiliopoulou,et al.  A Data Miner analyzing the Navigational Behaviour of , 1999 .

[7]  Brigitte Trousse,et al.  Advanced data preprocessing for intersites Web usage mining , 2004, IEEE Intelligent Systems.

[8]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[9]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[10]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[11]  Jian Pei,et al.  ApproxMAP: Approximate Mining of Consensus Sequential Patterns , 2003, SDM.

[12]  Brigitte Trousse,et al.  A New Agglomerative 2–3 Hierarchical Clustering Algorithm , 2005 .

[13]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[14]  Yongjian Fu,et al.  A Generalization-Based Approach to Clustering of Web Usage Sessions , 1999, WEBKDD.

[15]  Tao Luo,et al.  Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization , 2004, Data Mining and Knowledge Discovery.

[16]  Jun Hong,et al.  Using Markov Chains for Link Prediction in Adaptive Web Sites , 2002, Soft-Ware.

[17]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[18]  Florent Masseglia,et al.  The PSP Approach for Mining Sequential Patterns , 1998, PKDD.

[19]  Florent Masseglia,et al.  Diviser pour découvrir. Une méthode d'analyse du comportement de tous les utilisateurs d'un site web , 2003, Ingénierie des Systèmes d Inf..

[20]  Florent Masseglia,et al.  An efficient algorithm for Web usage mining , 1999 .

[21]  Dino Pedreschi,et al.  Web log data warehousing and mining for intelligent web caching , 2001, Data Knowl. Eng..