Web Usage Mining in Search Engines

Search engine logs not only keep navigation information, but also the queries made by their users. In particular, queries to a search engine follow a power-law distribution, which is far from uniform. Queries and related clicks can be used to improve the search engine itself in different aspects: user interface, index performance, and answer ranking. In this chapter we present some of the main ideas proposed in query mining and we show a few examples based on real data from a search engine focused on the Chilean Web. INTRODUCTION Given the rate of growth of the Web, scalability of search engines is a key issue, as the amount of hardware and network resources needed is large and expensive. In addition, search engines are popular tools, so they have heavy constraints on query answer time. So, the efficient use of resources can improve both scalability and answer time. One tool to achieve these goals is Web mining. In this chapter we focus on Web usage mining of logs of queries and user clicks to improve search engines and Websites. We do not consider other kinds of Web mining such as link analysis (Chakrabarti, 2002), content mining, or Web dynamics (Levene & Poulovassilis, 2003). There are few papers that deal with the use of query logs to improve search engines, because this information is usually not disclosed. The exceptions deal with strategies for caching the index and/or the answers (Markatos, 2000; Saraiva et al., 2001; Xie & This chapter appears in the bo k, Web Mining: Applications and Techniques, edited by Anthony Scime. Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. 701 E. Chocolate Avenue, Suite 200, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.idea-group.com IDEA GROUP P BLISHING

[1]  Sönke Lieberam-Schmidt,et al.  A new approach to relevancy in Internet searching - the "Vox Populi Algorithm" , 2003, ArXiv.

[2]  Yinglian Xie,et al.  Locality in search engine queries and its implications for caching , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[3]  Wagner Meira,et al.  Rank-preserving two-level caching for scalable search engines , 2001, SIGIR '01.

[4]  Dell Zhang,et al.  A novel Web usage mining approach for search engines , 2002, Comput. Networks.

[5]  Katsumi Takahashi,et al.  User behavior analysis of location aware search engine , 2002, Proceedings Third International Conference on Mobile Data Management MDM 2002.

[6]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[7]  Wei-Ying Ma,et al.  Log mining to improve the performance of site search , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops), 2002..

[8]  Ricardo A. Baeza-Yates,et al.  Agents, Crawlers, and Web Retrieval , 2002, CIA.

[9]  Amanda Spink,et al.  U.S. versus European web searching trends , 2002, SIGF.

[10]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[11]  Evangelos P. Markatos,et al.  On caching search engine query results , 2001, Comput. Commun..

[12]  Craig Silverstein,et al.  Analysis of a Very Large Altavista Query Log" SRC Technical note #1998-14 , 1998 .

[13]  Peter Pirolli,et al.  Computational models of information scent-following in a very large browsable text collection , 1997, CHI.

[14]  Mark Levene,et al.  Web Dynamics , 2004, Springer Berlin Heidelberg.

[15]  Panayiotis Zaphiris,et al.  Advances in Universal Web Design and Evaluation: Research, Trends and Opportunities , 2006 .

[16]  Amanda Spink,et al.  From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[17]  Christoph Hölscher,et al.  Web search behavior of Internet experts and newbies , 2000, Comput. Networks.

[18]  Panayiotis Zaphiris,et al.  Web Site Design for People with Dementia , 2007 .

[19]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[20]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[21]  Ricardo A. Baeza-Yates,et al.  A Three Level Search Engine Index Based in Query Log Distribution , 2003, SPIRE.