Using Topic Ontologies and Semantic Similarity Data to Evaluate Topical Search

Improvement in information retrieval systems is largely dependent on the ability to evaluate them. In order to assess the effectiveness of a retrieval system, test collections are needed. In traditional approaches users or hired evaluators provide manual assessments of relevance. However this does not scale with the complexity and heterogeneity of available digital information. This paper proposes to use topic ontologies and semantic similarity data to alleviate the efforts needed by human assessors to evaluate the rapidly expanding set of competing information retrieval methods. After providing experimental evidence supporting the validity of our approach we illustrate its application with an example in which the proposed evaluation procedure is used to assess the effectiveness of topical retrieval systems.

[1]  Dan Klein,et al.  Evaluating strategies for similarity search on the web , 2002, WWW '02.

[2]  Longzhuang Li,et al.  A new method for automatic performance comparison of search engines , 2004, World Wide Web.

[3]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[4]  Cyril W. Cleverdon,et al.  The significance of the Cranfield tests on index languages , 1991, SIGIR '91.

[5]  R. Akavipat,et al.  Emerging semantic communities in peer web search , 2006, P2PIR '06.

[6]  Dayne Freitag,et al.  A Machine Learning Architecture for Optimizing Web Search Engines , 1999 .

[7]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[8]  Ana Gabriela Maguitman,et al.  A semi-supervised incremental algorithm to automatically formulate topical queries , 2009, Inf. Sci..

[9]  Yiqun Liu,et al.  Automatic search engine performance evaluation with click-through data analysis , 2007, WWW '07.

[10]  Iadh Ounis,et al.  Research directions in Terrier: a search engine for advanced retrieval on the Web , 2007 .

[11]  Filippo Menczer,et al.  Efficient assembly of social semantic networks , 2008, HT '08.

[12]  Abdur Chowdhury,et al.  Using titles and category names from editor-driven taxonomies for automatic evaluation , 2003, CIKM '03.

[13]  Filippo Menczer,et al.  GiveALink: mining a semantic network of bookmarks for web search and recommendation , 2005, LinkKDD '05.

[14]  David Carmel,et al.  Scaling IR-system evaluation using term relevance sets , 2004, SIGIR '04.

[15]  Filippo Menczer,et al.  Algorithmic Computation and Approximation of Semantic Similarity , 2006, World Wide Web.

[16]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[17]  Filippo Menczer,et al.  Algorithmic detection of semantic similarity , 2005, WWW '05.

[18]  Ana Gabriela Maguitman,et al.  Suggesting novel but related topics: towards context-based support for knowledge model extension , 2005, IUI '05.

[19]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[20]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[21]  Gianni Amati,et al.  Probability models for information retrieval based on divergence from randomness , 2003 .

[22]  Ana Gabriela Maguitman,et al.  Using genetic algorithms to evolve a population of topical queries , 2008, Inf. Process. Manag..

[23]  Ciro Cattuto,et al.  Evaluating similarity measures for emergent semantics of social tagging , 2009, WWW '09.

[24]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[25]  Ana Gabriela Maguitman,et al.  Multiobjective evolutionary algorithms for context-based search , 2010, J. Assoc. Inf. Sci. Technol..

[26]  Filippo Menczer,et al.  Correlated topologies in citation networks and the Web , 2004 .