A generic Data Mart architecture to support Web mining

Visits in a Web site leave behind important information about the behavior of the visitors. This information is stored in log files, which can contain many registers but part of them do not contain relevant information. In such cases, user behavior analysis turns out to be a complex and time-consuming task. In order to analyze Web site visits, the relevant information has to be filtered and studied in an efficient way. We introduce a generic Data Mart architecture to support advanced Web mining, which is based on a Star model and contains the relevant historical data from visits to the Web site. Its fact table contains various additive measures that support the intended data mining tasks, whereas the dimension tables store the parameters necessary for such analysis, e.g. period of analysis, range of pages within a session. This generic repository allows one to store different kinds of information derived from visits to a Web site, such as e.g. time spent on each page in a session and sequences of pages in a session. Since the Data Mart has a flexible structure that allows one to add other interesting parameters describing visitors navigation, it provides a flexible research platform for various kinds of analysis. Based on these sources, user behavior can be characterized and stored in user behavior vectors that serve as input for data mining. For example, similar visits can be grouped together and typical user behavior can be identified, which allows improvement of Web sites and an understanding of user behavior. The application of the presented methodology to the Web site of a Chilean university shows its benefits. We analyzed visits to the respective Web site and could identify clusters of typical visitors. The analysis of these clusters is used for improved online marketing activities.

[1]  Wk Ching,et al.  A Cube Model for Web Access Sessions and Cluster Analysis , 2001 .

[2]  Anupam Joshi,et al.  On Mining Web Access Logs , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[3]  W. H. Inmon,et al.  Building the data warehouse (2nd ed.) , 1996 .

[4]  Ralf Walther,et al.  The Data Webhouse Toolkit , 2001, Künstliche Intell..

[5]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[6]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[7]  Terumasa Aoki,et al.  Voice Codification using Self Organizing Maps as Data Mining Tool , 2002, HIS.

[8]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .