Query enrichment for web-query classification

Web-search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this article, we present a new technique called query enrichment, which takes a short query and maps it to intermediate objects. Based on the collected intermediate objects, the query is then mapped to target categories. To build the necessary mapping functions, we use an ensemble of search engines to produce an enrichment of the queries. Our technique was applied to the ACM Knowledge Discovery and Data Mining competition (ACM KDDCUP) in 2005, where we won the championship on all three evaluation metrics (precision, F1 measure, which combines precision and recall, and creativity, which is judged by the organizers) among a total of 33 teams worldwide. In this article, we show that, despite the difficulty of an abundance of ambiguous queries and lack of training data, our query-enrichment technique can solve the problem satisfactorily through a two-phase classification framework. We present a detailed description of our algorithm and experimental evaluation. Our best result for F1 and precision is 42.4% and 44.4%, respectively, which is 9.6% and 24.3% higher than those from the runner-ups, respectively.

[1]  Domonkos Tikk,et al.  The Ferrety algorithm for the KDD Cup 2005 problem , 2005, SKDD.

[2]  Oren Etzioni,et al.  Multi-Engine Search and Comparison Using the MetaCrawler , 1995, World Wide Web J..

[3]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[4]  Alan J. Cann,et al.  Maths from Scratch for Biologists , 2002 .

[5]  M. Kendall Elementary Statistics , 1945, Nature.

[6]  Susan T. Dumais,et al.  Bringing order to the Web: automatically categorizing search results , 2000, CHI.

[7]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[9]  Jae Dong Yang,et al.  Experiment with a Hierarchical Text Categorization Method on WIPO Patent Collections , 2005 .

[10]  Robert V. Hogg,et al.  Elementary Statistics; Second Edition. , 1967 .

[11]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[12]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[13]  Stephen B. Vardeman,et al.  Elementary Statistics (2nd Ed.). , 1983 .

[14]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[15]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[16]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[17]  Qiang Yang,et al.  Building bridges for web query classification , 2006, SIGIR.

[18]  Paul G. Hoel,et al.  Elementary statistics , 1971 .

[19]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[20]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[21]  Adele E. Howe,et al.  SAVVYSEARCH: A Metasearch Engine That Learns Which Search Engines to Query , 1997, AI Mag..

[22]  David C. Kuncicky Introduction to Word , 1998 .

[23]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[24]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[25]  Chia-Hui Chang,et al.  Integrating Query Expansion and Conceptual Relevance Feedback for Personalized Web Information Retrieval , 1998, Comput. Networks.

[26]  Peter Haider,et al.  Classifying search engine queries using the web as background knowledge , 2005, SKDD.

[27]  Ying Li,et al.  KDD CUP-2005 report: facing a great challenge , 2005, SKDD.

[28]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[29]  Bernard J. Jansen,et al.  The effect of query complexity on Web searching results , 2000, Inf. Res..

[30]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[31]  Oren Etzioni,et al.  Multi-Service Search and Comparison Using the MetaCrawler , 1995 .

[32]  M. Benson,et al.  Collocations and General-purpose Dictionaries , 1990 .

[33]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[34]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[35]  YangQiang,et al.  Query enrichment for web-query classification , 2006 .

[36]  In-Ho Kang,et al.  Query type classification for web document retrieval , 2003, SIGIR.

[37]  Eli Upfal,et al.  Web search using automatic classification , 1996, WWW 1996.

[38]  Ophir Frieder,et al.  Automatic web query classification using labeled and unlabeled training data , 2005, SIGIR '05.

[39]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[40]  Qiang Yang,et al.  Q2C@UST: our winning solution to query classification in KDDCUP 2005 , 2005, SKDD.

[41]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[42]  D. Meyer,et al.  Statistical Mechanics of Voting , 1998, cond-mat/9806359.

[43]  Salvatore J. Stolfo,et al.  The application of AdaBoost for distributed, scalable and on-line learning , 1999, KDD '99.

[44]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .