Web Log Mining

In the design and implementation of an Intelligent Web Information System (IWIS), it is necessary to consider the learning and discovery functionalities that produce the required knowledge of the system. Web log files provide a useful resource for the discovery of useful knowledge. In the context of IWIS, we present a brief survey of Web log mining. An overview of the more general topic known as Web mining is given first. Web log mining is then reviewed by focusing on three important aspects, namely, data preparation, Web log mining, and applications.

[1]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[2]  Jiawei Han,et al.  Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[3]  Jaideep Srivastava,et al.  Creating adaptive Web sites through usage-based clustering of URLs , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[4]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[5]  Jiming Liu,et al.  Web Intelligence (WI) , 2001, Web Intelligence.

[6]  Mark Levene,et al.  Data Mining of User Navigation Patterns , 1999, WEBKDD.

[7]  Michael K. Ng,et al.  A Cube Model and Cluster Analysis for Web Access Sessions , 2001, WEBKDD.

[8]  Rajjan Shinghal,et al.  Evaluating the Interestingness of Characteristic Rules , 1996, KDD.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[11]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[12]  Mohammed J. Zaki,et al.  LOGML: Log Markup Language for Web Usage Mining , 2001, WEBKDD.

[13]  Myra Spiliopoulou,et al.  Web usage mining for Web site evaluation , 2000, CACM.

[14]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[15]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[16]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[17]  Ah-Hwee Tan,et al.  Text Mining: The state of the art and the challenges , 2000 .

[18]  George Karypis,et al.  Selective Markov models for predicting Web page accesses , 2004, TOIT.

[19]  Yiyu Yao,et al.  Web Intelligence (WI) , 2000, Proceedings 24th Annual International Computer Software and Applications Conference. COMPSAC2000.

[20]  Kyuseok Shim,et al.  Data mining and the Web: past, present and future , 1999, WIDM '99.

[21]  Cyrus Shahabi,et al.  Knowledge discovery from users Web-page navigation , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[22]  Anupam Joshi,et al.  Warehousing and mining Web logs , 1999, WIDM '99.

[23]  Philip S. Yu,et al.  Data mining for path traversal patterns in a web environment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[24]  Bettina Berendt,et al.  Detail and Context in Web Usage Mining: Coarsening and Visualizing Sequences , 2001, WEBKDD.

[25]  James E. Pitkow,et al.  Characterizing Browsing Behaviors on the World-Wide Web , 1995 .

[26]  Vipin Kumar,et al.  Mining Indirect Associations in Web Data , 2001, WEBKDD.

[27]  Farnoush Banaei Kashani,et al.  Feature Matrices: A Model for Efficient and Anonymous Web Usage Mining , 2001, EC-Web.

[28]  Jiawei Han,et al.  MultiMediaMiner: a system prototype for multimedia data mining , 1998, SIGMOD '98.

[29]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[30]  Ning Zhong,et al.  Dynamically Organizing KDD Processes , 2001, Int. J. Pattern Recognit. Artif. Intell..

[31]  Edith Cohen,et al.  Improving end-to-end performance of the Web using server volumes and proxy filters , 1998, SIGCOMM '98.

[32]  Maurice Mulvenna,et al.  Navigation Pattern Discovery from Internet Data , 1999 .

[33]  Philip S. Yu,et al.  SpeedTracer: A Web Usage Mining and Analysis Tool , 1998, IBM Syst. J..

[34]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[35]  Alex G. Büchner Discovering Internet Marketing Intelligence through Web Log Mining , 2003 .

[36]  Dino Pedreschi,et al.  Web log data warehousing and mining for intelligent web caching , 2001, Data Knowl. Eng..

[37]  Yiyu Yao,et al.  PagePrompter: An Intelligent Web Agent Created Using Data Mining Techniques , 2002, Rough Sets and Current Trends in Computing.

[38]  Fillia Makedon,et al.  Mining the Most Interesting Web Access Associations , 2000, WebNet.

[39]  Sourav S. Bhowmick,et al.  Research Issues in Web Data Mining , 1999, DaWaK.

[40]  Tapan Kamdar,et al.  On Creating Adaptive Web Servers Using Weblog Mining , 2000 .

[41]  Anupam Joshi,et al.  On Mining Web Access Logs , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[42]  Zhiqiang Zheng,et al.  Personalization from incomplete data: what you don't know can hurt , 2001, KDD '01.

[43]  Alex Alves Freitas,et al.  On rule interestingness measures , 1999, Knowl. Based Syst..

[44]  Qiang Yang,et al.  Mining web logs for prediction models in WWW caching and prefetching , 2001, KDD '01.

[45]  Jan Rauch Logical Calculi for Knowledge Discovery in Databases , 1997, PKDD.

[46]  Padhraic Smyth,et al.  Visualization of navigation patterns on a Web site using model-based clustering , 2000, KDD '00.

[47]  Andreas Geyer-Schulz,et al.  A Customer Purchase Incidence Model Applied to Recommender Services , 2001, WEBKDD.

[48]  Hongjun Lu,et al.  Cut-and-Pick Transactions for Proxy Log Mining , 2002, EDBT.

[49]  Mark Levene,et al.  Mining Association Rules in Hypertext Databases , 1998, KDD.

[50]  Philip S. Yu,et al.  Discovering unexpected information from your competitors' web sites , 2001, KDD '01.

[51]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[52]  Oren Etzioni,et al.  Adaptive Web Sites: Automatically Synthesizing Web Pages , 1998, AAAI/IAAI.

[53]  Alberto O. Mendelzon,et al.  WebOQL: restructuring documents, databases and Webs , 1998, Proceedings 14th International Conference on Data Engineering.

[54]  Oren Etzioni,et al.  Adaptive Web Sites: Conceptual Cluster Mining , 1999, IJCAI.

[55]  Rong Chen,et al.  Collective Mining of Bayesian Networks from Distributed Heterogeneous Data , 2004, Knowl. Inf. Syst..

[56]  Oren Etzioni,et al.  The World-Wide Web: quagmire or gold mine? , 1996, CACM.

[57]  Ramakrishnan Srikant,et al.  Mining web logs to improve website organization , 2001, WWW '01.

[58]  Cyrus Shahabi,et al.  Analysis and design of server informative WWW-sites , 1997, CIKM '97.

[59]  Jan Rauch,et al.  Mining for Association Rules by 4ft-Miner , 2001, INAP.

[60]  Marko Balabanovic,et al.  An adaptive Web page recommendation service , 1997, AGENTS '97.

[61]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[62]  Mark Levene,et al.  An Heuristic to Capture Longer User Web Navigation Patterns , 2000, EC-Web.

[63]  Farnoush Banaei Kashani,et al.  INSITE: A Tool for Real-Time Knowledge Discovery from Users Web Navigation * , 2000 .

[64]  Setrag Khoshafian,et al.  Multimedia and imaging databases , 1995 .

[65]  Jaideep Srivastava,et al.  Discovery of Interesting Usage Patterns from Web Data , 1999, WEBKDD.

[66]  Yannis Manolopoulos,et al.  Finding Generalized Path Patterns for Web Log Data Mining , 2000, ADBIS-DASFAA.

[67]  Yannis Manolopoulos,et al.  Exploiting Web Log Mining for Web Cache Enhancement , 2001, WEBKDD.

[68]  M. Carl Drott Using Web server logs to improve site design , 1998, SIGDOC '98.

[69]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[70]  Jaideep Srivastava,et al.  Automatic personalization based on Web usage mining , 2000, CACM.

[71]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[72]  V. S. Subrahmanian Principles of Multimedia Database Systems , 1998 .

[73]  Jiang Zhu,et al.  Combining Web Usage and Content Mining for More Effective Personalization , 2000 .

[74]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[75]  Pat Langley,et al.  User modeling in adaptive interfaces , 1999 .

[76]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[77]  Yiyu Yao,et al.  An Analysis of Quantitative Measures Associated with Rules , 1999, PAKDD.

[78]  Umeshwar Dayal,et al.  From User Access Patterns to Dynamic Hypertext Linking , 1996, Comput. Networks.

[79]  Yoshitsugu Kakemoto,et al.  KDD Process Planning , 1997, KDD.

[80]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[81]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[82]  Nikos Koutsoupias,et al.  Exploring Web Access Logs with Correspondence Analysis , 2002 .

[83]  Farnoush Banaei Kashani,et al.  A Framework for Efficient and Anonymous Web Usage Mining Based on Client-Side Tracking , 2001, WEBKDD.