Understanding user's query intent with wikipedia

Understanding the intent behind a user's query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches to predict the user's intent mainly utilize machine learning techniques. However, it is difficult and often requires many human efforts to meet all these challenges by the statistical machine learning approaches. In this paper, we propose a general methodology to the problem of query intent classification. With very little human effort, our method can discover large quantities of intent concepts by leveraging Wikipedia, one of the best human knowledge base. The Wikipedia concepts are used as the intent representation space, thus, each intent domain is represented as a set of Wikipedia articles and categories. The intent of any input query is identified through mapping the query into the Wikipedia representation space. Compared with previous approaches, our proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small. Moreover, the method is very general and can be easily applied to various intent domains. We demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided. We perform the quantitative evaluations in comparison with two baseline methods, and the experimental results shows that our method significantly outperforms other methods in each intent domain.

[1]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[2]  Peter Haider,et al.  Classifying search engine queries using the web as background knowledge , 2005, SKDD.

[3]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[4]  Giuseppe Attardi,et al.  Ranking very many typed entities on wikipedia , 2007, CIKM '07.

[5]  Ophir Frieder,et al.  Improving automatic query classification via semi-supervised learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[6]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[7]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[8]  Qiang Yang,et al.  Q2C@UST: our winning solution to query classification in KDDCUP 2005 , 2005, SKDD.

[9]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[10]  Hua Li,et al.  Enhancing text clustering by leveraging Wikipedia semantics , 2008, SIGIR '08.

[11]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[12]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[13]  Ying Li,et al.  Personal name classification in web queries , 2008, WSDM '08.

[14]  Maria Ruiz-Casado,et al.  Automatic Extraction of Semantic Relationships for WordNet by Means of Pattern Learning from Wikipedia , 2005, NLDB.

[15]  Antonio Toral,et al.  A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia , 2006, Workshop On New Text Wikis And Blogs And Other Dynamic Text Sources.

[16]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[17]  Jian Hu,et al.  Improving Text Classification by Using Encyclopedia Knowledge , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[18]  Péter Schönhofen,et al.  Identifying Document Topics Using the Wikipedia Category Network , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[19]  Hugo Zaragoza,et al.  Inferring the most important types of a query: a semantic approach , 2008, SIGIR '08.

[20]  Joon Ho Lee,et al.  Combining multiple evidence from different properties of weighting schemes , 1995, SIGIR '95.

[21]  Andrei Z. Broder,et al.  Robust classification of rare queries using web knowledge , 2007, SIGIR.

[22]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[23]  Qiang Yang,et al.  Building bridges for web query classification , 2006, SIGIR.

[24]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[25]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[26]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[27]  Evgeniy Gabrilovich,et al.  Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge , 2006, AAAI.

[28]  Ying Li,et al.  Detecting online commercial intention (OCI) , 2006, WWW '06.