Sources of evidence for vertical selection

Web search providers often include search services for domain-specific subcollections, called verticals, such as news, images, videos, job postings, company summaries, and artist profiles. We address the problem of vertical selection, predicting relevant verticals (if any) for queries issued to the search engine's main web search page. In contrast to prior query classification and resource selection tasks, vertical selection is associated with unique resources that can inform the classification decision. We focus on three sources of evidence: (1) the query string, from which features are derived independent of external resources, (2) logs of queries previously issued directly to the vertical, and (3) corpora representative of vertical content. We focus on 18 different verticals, which differ in terms of semantics, media type, size, and level of query traffic. We compare our method to prior work in federated search and retrieval effectiveness prediction. An in-depth error analysis reveals unique challenges across different verticals and provides insight into vertical selection for future work.

[1]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[2]  Ying Li,et al.  KDD CUP-2005 report: facing a great challenge , 2005, SKDD.

[3]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[4]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[5]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[6]  Qiang Yang,et al.  Building bridges for web query classification , 2006, SIGIR.

[7]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[8]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[9]  Fernando Diaz,et al.  Integration of news content into web results , 2009, WSDM '09.

[10]  Luis Gravano,et al.  GlOSS: text-source discovery over the Internet , 1999, TODS.

[11]  Qiang Yang,et al.  Q2C@UST: our winning solution to query classification in KDDCUP 2005 , 2005, SKDD.

[12]  Luo Si Federated search of text search engines in uncooperative environments , 2007, SIGF.

[13]  Milad Shokouhi,et al.  Using query logs to establish vocabularies in distributed information retrieval , 2007, Inf. Process. Manag..

[14]  W. Bruce Croft,et al.  Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[15]  Ophir Frieder,et al.  Improving automatic query classification via semi-supervised learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[16]  Dik Lun Lee,et al.  Server Ranking for Distributed Text Retrieval Systems on the Internet , 1997, DASFAA.

[17]  Ophir Frieder,et al.  Automatic classification of Web queries using very large unlabeled query logs , 2007, TOIS.