Mining web logs for actionable knowledge

Everyday, popular Websites attract millions of visitors. These visitors leave behind vast amounts of Websites traversal information in the form of Web server and query logs. By analyzing these logs, it is possible to discover various kinds of knowledge, which can be applied to improve the performance of Web services. A particularly useful kind of knowledge is knowledge that can be immediately applied to the operation of the Web-sites; we call this type of knowledge actionable knowledge. In this chapter, we present three examples of actionable Web log mining. The first method is to mine a Web log for Markov models that can be used for improving caching and prefetching of Web objects. A second method is to use the mined knowledge for building better, adaptive user interfaces. The new user interface can adjust as the user behavior changes with time. Finally, we present an example of applying Web query log knowledge to improving Web search for a search engine application.

[1]  Alan Gilchrist,et al.  Thesaurus construction: a practical manual , 1972 .

[2]  Wagner Meira,et al.  Rank-preserving two-level caching for scalable search engines , 2001, SIGIR '01.

[3]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[4]  T. Joachims WebWatcher : A Tour Guide for the World Wide Web , 1997 .

[5]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[6]  Neel Sundaresan,et al.  Mining the Web for relations , 2000, Comput. Networks.

[7]  Michael D. Smith,et al.  Using Path Profiles to Predict HTTP Requests , 1998, Comput. Networks.

[8]  Thorsten Joachims,et al.  WebWatcher : A Learning Apprentice for the World Wide Web , 1995 .

[9]  Hongjun Lu,et al.  Cut-and-Pick Transactions for Proxy Log Mining , 2002, EDBT.

[10]  Perkowitz Oren,et al.  Adaptive Web Sites : Concept and Case StudyMike , 2001 .

[11]  Boris Chidlovskii,et al.  Semantic Cache Mechanism for Heterogeneous Web Querying , 1999, Comput. Networks.

[12]  Martin F. Arlitt,et al.  Evaluating content management techniques for Web proxy caches , 2000, PERV.

[13]  Qiang Yang,et al.  A prediction system for multimedia pre-fetching in Internet , 2000, ACM Multimedia.

[14]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[15]  G Stix,et al.  The mice that warred. , 2001, Scientific American.

[16]  Oren Etzioni,et al.  Adaptive Web sites , 2000, CACM.

[17]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[18]  Qiang Yang,et al.  Integrating Web Prefetching and Caching Using Prediction Models , 2002, World Wide Web.

[19]  Padmini Srinivasan,et al.  Thesaurus Construction , 1992, Information Retrieval: Data Structures & Algorithms.

[20]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[21]  Oren Etzioni,et al.  The World-Wide Web: quagmire or gold mine? , 1996, CACM.

[22]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[23]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[24]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[25]  Myra Spiliopoulou,et al.  Data Mining for Measuring and Improving the Success of Web Sites , 2004, Data Mining and Knowledge Discovery.

[26]  Thorsten Joachims,et al.  Web Watcher: A Tour Guide for the World Wide Web , 1997, IJCAI.

[27]  Shaoping Ma,et al.  Correlation-based Web-Document Clustering for Web Interface Design , 2002 .

[28]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[29]  Michael J. Pazzani,et al.  Syskill & Webert: Identifying Interesting Web Sites , 1996, AAAI/IAAI, Vol. 1.

[30]  Fabio Crestani,et al.  Exploiting the Similarity of Non-Matching Terms at Retrieval Time , 2000, Information Retrieval.

[31]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[32]  Philip S. Yu,et al.  Caching on the World Wide Web , 1999, IEEE Trans. Knowl. Data Eng..

[33]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[34]  Jianfeng Gao,et al.  Improving Encarta Search Engine Performance by Mining User Logs , 2002, Int. J. Pattern Recognit. Artif. Intell..

[35]  Jiawei Han,et al.  Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..

[36]  Dayne Freitag,et al.  A Machine Learning Architecture for Optimizing Web Search Engines , 1999 .

[37]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[38]  Qiang Yang,et al.  Mining web logs for prediction models in WWW caching and prefetching , 2001, KDD '01.

[39]  Jiawei Han,et al.  Generalization and decision tree induction: efficient classification in data mining , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[40]  Jiawei Han,et al.  Mining knowledge at multiple concept levels , 1995, CIKM '95.

[41]  Philip S. Yu,et al.  Discovering unexpected information from your competitors' web sites , 2001, KDD '01.

[42]  Oren Etzioni,et al.  Towards adaptive Web sites: Conceptual framework and case study , 2000, Artif. Intell..

[43]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[44]  Dekang Lin,et al.  Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity , 1997, ACL.

[45]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[46]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[47]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[48]  Jiawei Han,et al.  Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[49]  Ingrid Zukerman,et al.  Pre-sending Documents on the WWW: A Comparative Study , 1999, IJCAI.

[50]  J. Current,et al.  Theory and methodology , 1991 .

[51]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[52]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .