Measuring Diversity of a Domain-Specific Crawl

In this work we present various metrics to measure diversity of a domain-speci c crawl. We evaluate these metrics using domainspeci c crawl originated from ODP URLs and nd that these metrics are indeed able to capture diversity. We argue that these metrics can be used for comparing seed sets and crawling strategies with respect to diversity.

[1]  Craig MacDonald,et al.  Aggregated Search Result Diversification , 2011, ICTIR.

[2]  Vasudeva Varma,et al.  Don't Use a Lot When Little Will Do: Genre Identification Using URLs , 2013, Res. Comput. Sci..

[3]  Wolfgang Nejdl,et al.  Current Approaches to Search Result Diversification , 2009, LivingWeb@ISWC.

[4]  Filippo Menczer,et al.  Evaluating topic-driven web crawlers , 2001, SIGIR '01.

[5]  Ji-Rong Wen,et al.  Multi-dimensional search result diversification , 2011, WSDM '11.

[6]  Vasudeva Varma,et al.  Domain specific search in indian languages , 2012, IKM4DR '12.

[7]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[8]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[9]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[10]  Craig MacDonald,et al.  Explicit Search Result Diversification through Sub-queries , 2010, ECIR.

[11]  Vasudeva Varma,et al.  Seed selection for domain-specific search , 2014, WWW '14 Companion.

[12]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[13]  Ximena Olivares,et al.  Visual diversification of image search results , 2009, WWW '09.

[14]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Yi-Cheng Zhang,et al.  Solving the apparent diversity-accuracy dilemma of recommender systems , 2008, Proceedings of the National Academy of Sciences.