Extracting semantic relations from query logs

In this paper we study a large query log of more than twenty million queries with the goal of extracting the semantic relations that are implicitly captured in the actions of users submitting queries and clicking answers. Previous query log analyses were mostly done with just the queries and not the actions that followed after them. We first propose a novel way to represent queries in a vector space based on a graph derived from the query-click bipartite graph. We then analyze the graph produced by our query log, showing that it is less sparse than previous results suggested, and that almost all the measures of these graphs follow power laws, shedding some light on the searching user behavior as well as on the distribution of topics that people want in the Web. The representation we introduce allows to infer interesting semantic relationships between queries. Second, we provide an experimental analysis on the quality of these relations, showing that most of them are relevant. Finally we sketch an application that detects multitopical URLs.

[1]  Wei-Ying Ma,et al.  Learning To Cluster Search Results , 2004 .

[2]  Shui-Lung Chuang,et al.  Towards automatic generation of query taxonomy: a hierarchical query clustering approach , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Carlos A. Hurtado,et al.  Automatic Maintenance ofWeb Directories using Click-Through Data , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[4]  Nivio Ziviani,et al.  Using association rules to discover search engines related queries , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).

[5]  James Surowiecki The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations Doubleday Books. , 2004 .

[6]  Georges Dupret,et al.  Automatic Query Recommendation using Click-Through Data , 2006, IFIP PPAI.

[7]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[8]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[9]  Pu-Jen Cheng,et al.  Query taxonomy generation for web search , 2006, CIKM '06.

[10]  Shui-Lung Chuang,et al.  Automatic query taxonomy generation for information retrieval applications , 2003, Online Inf. Rev..

[11]  Ricardo A. Baeza-Yates,et al.  Graphs from Search Engine Queries , 2007, SOFSEM.

[12]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[13]  Nivio Ziviani,et al.  Using association rules to discover related queries on search engines , 2003 .

[14]  Shui-Lung Chuang,et al.  Subject categorization of query terms for exploring Web users' search interests , 2002, J. Assoc. Inf. Sci. Technol..

[15]  Ricardo A. Baeza-Yates,et al.  Applications of Web Query Mining , 2005, ECIR.

[16]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[17]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[18]  Larry Fitzpatrick,et al.  Automatic feedback using past queries: social searching? , 1997, SIGIR '97.

[19]  Vijay V. Raghavan,et al.  On the reuse of past optimal queries , 1995, SIGIR '95.

[20]  Ricardo A. Baeza-Yates,et al.  Query Clustering for Boosting Web Page Ranking , 2004, AWIC.

[21]  Edmund A. Mennis The Wisdom of Crowds: Why the Many Are Smarter than the Few and How Collective Wisdom Shapes Business, Economies, Societies, and Nations , 2006 .

[22]  Shui-Lung Chuang,et al.  Enriching Web taxonomies through subject categorization of query terms from search engine logs , 2003, Decis. Support Syst..