Classifying search engine queries using the web as background knowledge

The performance of search engines crucially depends on their ability to capture the meaning of a query most likely intended by the user. We study the problem of mapping a search engine query to those nodes of a given subject taxonomy that characterize its most likely meanings. We describe the architecture of a classification system that uses a web directory to identify the subject context that the query terms are frequently used in. Based on its performance on the classification of 800,000 example queries recorded from MSN search, the system received the Runner-Up Award for Query Categorization Performance of the KDD Cup 2005.

[1]  Steffen Staab,et al.  Handbook on Ontologies in Information Systems , 2003 .

[2]  Pedro M. Domingos,et al.  Ontology Matching: A Machine Learning Approach , 2004, Handbook on Ontologies.

[3]  Peter G. Anick,et al.  The paraphrase search assistant: terminological feedback for iterative information seeking , 1999, SIGIR '99.

[4]  Masaru Kitsuregawa,et al.  C4-2: Combining Link and Contents in Clustering Web Search Results to Improve Information Interpretation , 2002 .

[5]  Natalie S. Glance,et al.  Community search assistant , 2001, IUI '01.

[6]  Y. Lacasse,et al.  From the authors , 2005, European Respiratory Journal.

[7]  Dawid Weiss,et al.  Carrot and Language Properties in Web Search Results Clustering , 2003, AWIC.

[8]  Peter Bruza,et al.  Web searching: A process-oriented experimental study of three interactive search paradigms , 2002, J. Assoc. Inf. Sci. Technol..

[9]  Dell Zhang,et al.  Learning to integrate web taxonomies , 2004, J. Web Semant..

[10]  Peter Bruza,et al.  Query Reformulation on the Internet: Empirical Data and the Hyperindex Search Engine , 1997, RIAO.

[11]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[12]  John Tait,et al.  Word sense disambiguation in information retrieval revisited , 2003, SIGIR.

[13]  Masaru Kitsuregawa,et al.  On Combining Link and Contents Information for Web Page Clustering , 2002, DEXA.

[14]  James Allan,et al.  Using part-of-speech patterns to reduce query ambiguity , 2002, SIGIR '02.

[15]  Hinrich Schütze,et al.  Information retrieval based on word senses , 1995 .

[16]  Ramakrishnan Srikant,et al.  On integrating catalogs , 2001, WWW '01.

[17]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[18]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[19]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[20]  Mark Sanderson,et al.  Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ , 2022 .