Vertical selection in the information domain of children

In this paper we explore the vertical selection methods in aggregated search in the specific domain of topics for children between 7 and 12 years old. A test collection consisting of 25 verticals, 3.8K queries and relevant assessments for a large sample of these queries mapping relevant verticals to queries was built. We gather relevant assessment by envisaging two aggregated search systems: one in which the Web vertical is always displayed and in which each vertical is assessed independently from the web vertical. We show that both approaches lead to a different set of relevant verticals and that the former is prone to bias of visually oriented verticals. In the second part of this paper we estimate the size of the verticals for the target domain. We show that employing the global size and domain specific size estimation of the verticals lead to significant improvements when using state-of-the art methods of vertical selection. We also introduce a novel vertical and query representation based on tags from social media and we show that its use lead to significant performance gains.

[1]  Djoerd Hiemstra,et al.  An analysis of queries intended to search information for children , 2010, IIiX.

[2]  C. Bauckhage,et al.  Analyzing Social Bookmarking Systems : A del . icio . us Cookbook , 2008 .

[3]  Elizabeth Foss,et al.  Children's search roles at home: Implications for designers, researchers, educators, and parents , 2012, J. Assoc. Inf. Sci. Technol..

[4]  Philipp Schaer,et al.  Better than Their Reputation? On the Reliability of Relevance Assessments with Students , 2012, CLEF.

[5]  Milad Shokouhi,et al.  Federated Search , 2011, Found. Trends Inf. Retr..

[6]  Sheng Wu,et al.  Estimating collection size with logistic regression , 2007, SIGIR.

[7]  Alan F. Smeaton,et al.  A study of inter-annotator agreement for opinion retrieval , 2009, SIGIR.

[8]  David Hawking,et al.  Evaluating sampling methods for uncooperative collections , 2007, SIGIR.

[9]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[10]  Djoerd Hiemstra,et al.  Query log analysis in the context of information retrieval for children , 2010, SIGIR '10.

[11]  Doug Downey,et al.  Understanding the relationship between searchers' queries and information goals , 2008, CIKM '08.

[12]  Djoerd Hiemstra,et al.  Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term , 2002, SIGIR '02.

[13]  Ingmar Weber,et al.  What and how children search on the web , 2011, CIKM '11.

[14]  Fernando Diaz,et al.  Sources of evidence for vertical selection , 2009, SIGIR.

[15]  Ke Zhou,et al.  Evaluating large-scale distributed vertical search , 2011, LSDS-IR '11.

[16]  Arjen P. de Vries,et al.  A combined topical/non-topical approach to identifying web sites for children , 2011, WSDM '11.

[17]  Marie-Francine Moens,et al.  Wisdom of the ages: toward delivering the children's web with the link-based agerank algorithm , 2010, CIKM.

[18]  Fernando Diaz,et al.  A Methodology for Evaluating Aggregated Search Results , 2011, ECIR.

[19]  Fernando Diaz,et al.  Vertical selection in the presence of unlabeled verticals , 2010, SIGIR '10.

[20]  Mounia Lalmas,et al.  A Task-Based Evaluation of an Aggregated Search Interface , 2009, SPIRE.

[21]  Jaime Arguello,et al.  Task complexity, vertical display and user interaction in aggregated search , 2012, SIGIR '12.

[22]  Djoerd Hiemstra,et al.  Query recommendation for children , 2012, CIKM '12.

[23]  Ahmed Hassan Awadallah,et al.  Beyond DCG: user behavior as a predictor of a successful search , 2010, WSDM '10.

[24]  David Hawking,et al.  Evaluation by comparing result sets in context , 2006, CIKM '06.

[25]  Milad Shokouhi,et al.  Capturing collection size for distributed non-cooperative retrieval , 2006, SIGIR.