Query Classification using Wikipedia's Category Graph

Wikipedia's category graph is a network of 300,000 interconnected category labels, and can be a powerful resource for many classification tasks. However, its size and the lack of order can make it difficult to navigate. In this paper, we present a new algorithm to efficiently exploit this graph and accurately rank classification labels given user-specified keywords. We highlight multiple possible variations of this algorithm, and study the impact of these variations on the classification results in order to determine the optimal way to exploit the category graph. We implement our algorithm as the core of a query classification system and demonstrate its reliability using the KDD CUP 2005 and TREC 2007 competitions as benchmarks.

[1]  Ying Li,et al.  KDD CUP-2005 report: facing a great challenge , 2005, SKDD.

[2]  Na Ye,et al.  Automatic Web Query Classification Using Large Unlabeled Web Pages , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[3]  Jian Hu,et al.  Using Wikipedia knowledge to improve text classification , 2009, Knowledge and Information Systems.

[4]  Gang Wang,et al.  Understanding user's query intent with wikipedia , 2009, WWW '09.

[5]  Ophir Frieder,et al.  Automatic classification of Web queries using very large unlabeled query logs , 2007, TOIS.

[6]  Enhong Chen,et al.  Context-aware query classification , 2009, SIGIR.

[7]  Qiang Yang,et al.  Q2C@UST: our winning solution to query classification in KDDCUP 2005 , 2005, SKDD.

[8]  Jimmy J. Lin,et al.  Overview of the TREC 2007 Question Answering Track , 2008, TREC.

[9]  Fakhri Karray,et al.  An Efficient Method for Tagging a Query with Category Labels Using Wikipedia towards Enhancing Search Engine Results , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[10]  Fakhri Karray,et al.  Exploring Wikipedia's Category Graph for Query Classification , 2011, AIS.

[11]  Somnath Banerjee,et al.  Clustering short texts using wikipedia , 2007, SIGIR.

[12]  Qiang Yang,et al.  Building bridges for web query classification , 2006, SIGIR.

[13]  Lehel Csató,et al.  Wikipedia-Based Kernels for Text Categorization , 2007, Ninth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2007).

[14]  Timothy W. Finin,et al.  Wikipedia as an Ontology for Describing Documents , 2008, ICWSM.

[15]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[16]  Jinzhong Xu,et al.  Domain Ontology Based Automatic Question Answering , 2009, 2009 International Conference on Computer Engineering and Technology.

[17]  Richard Khoury,et al.  Query classification using Wikipedia , 2011, Int. J. Intell. Inf. Database Syst..

[18]  Rada Mihalcea,et al.  Topic Identification Using Wikipedia Graph Centrality , 2009, NAACL.

[19]  Qiang Yang,et al.  PQC: personalized query classification , 2009, CIKM.

[20]  Péter Schönhofen Identifying document topics using the Wikipedia category network , 2009, Web Intell. Agent Syst..