Warehousing and mining Web logs

Analyzing Web Logs for usage and access trends can not only provide important information to web site developers and administrators, but also help in creating adaptive web sites. While there are many existing tools that generate fixed reports from web logs, they typically do not allow ad-hoc analysis queries. Moreover, such tools cannot discover hidden patterns of access embedded in the access logs. We describe a relational OLAP (ROLAP) approach for creating a web-log warehouse. This is populated both from web logs, as well as the results of mining web logs. We also present a web based ad-hoc tool for analytic queries on the warehouse. We discuss the design criteria that influenced our choice of dimensions, facts and data granularity, and present the results from analyzing and mining the logs.

[1]  Jiawei Han,et al.  Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[2]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[3]  Mika Klemettinen,et al.  Mining in the Phrasal Frontier , 1997, PKDD.

[4]  Cyrus Shahabi,et al.  Analysis and design of server informative WWW-sites , 1997, CIKM '97.

[5]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[6]  Anupam,et al.  Mining Web Access Logs Using Relational Competitive Fuzzy Clustering , 1999 .

[7]  Cyrus Shahabi,et al.  Knowledge discovery from users Web-page navigation , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[8]  R. Krishnapuram,et al.  A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[9]  Philip S. Yu,et al.  SpeedTracer: A Web Usage Mining and Analysis Tool , 1998, IBM Syst. J..

[10]  Anupam Joshi,et al.  Robust Fuzzy Clustering Methods to Support Web Mining , 1998 .

[11]  Ramakrishnan Srikant,et al.  Discovering Trends in Text Databases , 1997, KDD.

[12]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[13]  Oren Etzioni,et al.  Towards adaptive Web sites: Conceptual framework and case study , 1999, Artif. Intell..

[14]  Anupam Joshi,et al.  Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator , 1999, WWW 1999.

[15]  C. V. Ramamoorthy,et al.  Knowledge and Data Engineering , 1989, IEEE Trans. Knowl. Data Eng..

[16]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.