Mining generalized query patterns from web logs

User logs of a popular search engine keep track of user activities including user queries, user click-through from the returned list, and user browsing behaviors. Knowledge about user queries discovered from user logs can improve the performance of the search engine. We propose a data-mining approach that produces generalized query patterns or templates from the raw user logs of a popular commercial knowledge-based search engine that is currently in use. Our simulation shows that such templates can improve search engine's speed and precision, and can cover queries not asked previously. The templates are also comprehensible so web editors can easily discover topics in which most users are interested.

[1]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[2]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[3]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[4]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[5]  Jiawei Han,et al.  Mining knowledge at multiple concept levels , 1995, CIKM '95.

[6]  Dayne Freitag,et al.  A Machine Learning Architecture for Optimizing Web Search Engines , 1999 .

[7]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[8]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[9]  Padmini Srivasan,et al.  Thesaurus Construction , 1992, Information Retrieval: Data Structures & Algorithms.

[10]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[11]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[12]  Padmini Srinivasan,et al.  Thesaurus Construction , 1992, Information Retrieval: Data Structures & Algorithms.

[13]  Jiawei Han,et al.  Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[14]  Jiawei Han,et al.  Generalization and decision tree induction: efficient classification in data mining , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[15]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[16]  Henk Sol,et al.  Proceedings of the 54th Hawaii International Conference on System Sciences , 1997, HICSS 2015.

[17]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[18]  Dekang Lin,et al.  Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity , 1997, ACL.

[19]  Neel Sundaresan,et al.  Mining the Web for relations , 2000, Comput. Networks.

[20]  Alan Gilchrist,et al.  Thesaurus construction: a practical manual , 1972 .

[21]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[22]  Raymond T. Ng,et al.  Very large data bases , 1994 .

[23]  Jiawei Han,et al.  Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..