Combining Data Warehousing and Data Mining Techniques for Web Log Analysis

Enormous amounts of information about Web site user behavior are collected in Web server logs. However, this information is only useful if it can be queried and analyzed to provide high-level knowledge about user navigation patterns, a task that requires powerful techniques.This chapter presents a number of approaches that combine data warehousing and data mining techniques in order to analyze Web logs. After introducing the well-known click and session data warehouse (DW) schemas, the chapter presents the subsession schema, which allows fast queries on sequences of page visits. Then, the chapter presents the so-called " hybrid " technique, which combines DW Web log schemas with a data mining technique called Hypertext Probabilistic Grammars, hereby providing fast and flexible constraint-based Web log analysis. Finally, the chapter presents a " post-check enhanced " improvement of the hybrid technique.

[1]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[2]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[3]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[4]  Sunita Sarawagi,et al.  Integrating Mining with Relational Database Systems: Alternatives and Implications. , 1998, SIGMOD 1998.

[5]  Jiawei Han,et al.  OLAP Mining: Integration of OLAP with Data Mining , 1997, DS-7.

[6]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[7]  Mark Levene,et al.  Data Mining of User Navigation Patterns , 1999, WEBKDD.

[8]  Mark Levene,et al.  A fine grained heuristic to capture web navigation patterns , 2000, SKDD.

[9]  Torben Bach Pedersen,et al.  Analyzing clickstreams using subsessions , 2000, DOLAP '00.

[10]  Wolfgang Lehner,et al.  COMBI-Operator: Database Support for Data Mining Applications , 2003, VLDB.

[11]  Alex G. Büchner Discovering Internet Marketing Intelligence through Web Log Mining , 2003 .

[12]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[13]  P. Tan,et al.  WebSIFT : The Web Site Information Filter , 1999 .

[14]  Mark Levene,et al.  A Probabilistic Approach to Navigation in Hypertext , 1999, Inf. Sci..

[15]  Torben Bach Pedersen,et al.  Evaluating the markov assumption for web usage mining , 2003, WIDM '03.

[16]  Torben Bach Pedersen,et al.  A Hybrid Approach to Web Usage Mining , 2002, DaWaK.

[17]  Daniela Florescu,et al.  Quilt: An XML Query Language for Heterogeneous Data Sources , 2000, WebDB.

[18]  José Luis Cabral de Moura Borges,et al.  A data mining model to capture user web navigation patterns , 2000 .

[19]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[20]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[21]  Ralf Walther,et al.  The Data Webhouse Toolkit , 2001, Künstliche Intell..