Social book search: comparing topical relevance judgements and book suggestions for evaluation

The Web and social media give us access to a wealth of information, not only different in quantity but also in character---traditional descriptions from professionals are now supplemented with user generated content. This challenges modern search systems based on the classical model of topical relevance and ad hoc search: How does their effectiveness transfer to the changing nature of information and to the changing types of information needs and search tasks? We use the INEX 2011 Books and Social Search Track's collection of book descriptions from Amazon and social cataloguing site LibraryThing. We compare classical IR with social book search in the context of the LibraryThing discussion forums where members ask for book suggestions. Specifically, we compare book suggestions on the forum with Mechanical Turk judgements on topical relevance and recommendation, both the judgements directly and their resulting evaluation of retrieval systems. First, the book suggestions on the forum are a complete enough set of relevance judgements for system evaluation. Second, topical relevance judgements result in a different system ranking from evaluation based on the forum suggestions. Although it is an important aspect for social book search, topical relevance is not sufficient for evaluation. Third, professional metadata alone is often not enough to determine the topical relevance of a book. User reviews provide a better signal for topical relevance. Fourth, user-generated content is more effective for social book search than professional metadata. Based on our findings, we propose an experimental evaluation that better reflects the complexities of social book search.

[1]  John Le,et al.  Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution , 2010 .

[2]  Lois Mai Chan,et al.  Linking folksonomy to Library of Congress subject headings: an exploratory study , 2009, J. Documentation.

[3]  Adam Mathes,et al.  Folksonomies-Cooperative Classification and Communication Through Shared Metadata , 2004 .

[4]  Kwan Yi,et al.  A semantic similarity approach to predicting Library of Congress subject headings for social tags , 2010, J. Assoc. Inf. Sci. Technol..

[5]  M. de Rijke,et al.  Credibility Improves Topical Blog Post Retrieval , 2008, ACL.

[6]  Gabriella Kazai,et al.  In Search of Quality in Crowdsourcing for Search Engine Evaluation , 2011, ECIR.

[7]  Jakob Voß,et al.  Tagging, Folksonomy & Co - Renaissance of Manual Indexing? , 2007, ArXiv.

[8]  Michael K. Buckland,et al.  Vocabulary as a Central Concept in Library and Information Science , 1999, CoLIS.

[9]  F. W. Lancaster,et al.  Vocabulary control for information retrieval , 1972 .

[10]  Ellen M. Voorhees,et al.  Bias and the limits of pooling for large collections , 2007, Information Retrieval.

[11]  Xiaohua Hu,et al.  User tags versus expert-assigned subject terms: A comparison of LibraryThing tags and Library of Congress Subject Headings , 2010, J. Inf. Sci..

[12]  Elaine Svenonius,et al.  Unanswered questions in the design of controlled vocabularies , 1986, J. Am. Soc. Inf. Sci..

[13]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[14]  Stephen E. Robertson,et al.  A new rank correlation coefficient for information retrieval , 2008, SIGIR '08.

[15]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[16]  Norbert Fuhr,et al.  Overview and Results of the INEX 2009 Interactive Track , 2010, ECDL.

[17]  Matthew Lease,et al.  Crowdsourcing Document Relevance Assessment with Mechanical Turk , 2010, Mturk@HLT-NAACL.

[18]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[19]  Ricardo Baeza-Yates,et al.  Design and Implementation of Relevance Assessments Using Crowdsourcing , 2011, ECIR.

[20]  Catherine Sheldrick Ross,et al.  Finding without seeking: the information encounter in the context of reading for pleasure , 1999, Inf. Process. Manag..

[21]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[22]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[23]  Kara Reuter,et al.  Assessing aesthetic relevance: Children's book selection in a digital library , 2007, J. Assoc. Inf. Sci. Technol..

[24]  Jens Terliesner,et al.  Retrieval effectiveness of tagging systems , 2011, ASIST.

[25]  Gabriella Kazai,et al.  Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking , 2011, SIGIR.

[26]  Gabriella Kazai,et al.  Effects of Social Approval Votes on Search Performance , 2009, 2009 Sixth International Conference on Information Technology: New Generations.