Crowdsourcing for search evaluation

The Crowdsourcing for Search Evaluation Workshop (CSE 2010) was held on July 23, 2010 in Geneva, Switzerland, in conjunction with the 33rd Annual ACM SIGIR Conference. The workshop addressed the latest advances in theory and empirical methods in crowdsourcing for search evaluation, as well as novel applications of crowdsourcing for evaluating search systems. Three invited talks were presented, along with seven refereed papers. Proceedings from the workshop, along with presentation slides, have been made available online.

[1]  Alexandros Ntoulas,et al.  Estimating the Quality of Postings in the Real-time Web , 2010 .

[2]  and software — performance evaluation , .

[3]  James Allan,et al.  Minimal test collections for retrieval evaluation , 2006, SIGIR.

[4]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009 .

[5]  Cyril Cleverdon,et al.  The Cranfield tests on index language devices , 1997 .

[6]  Mary Beth Rosson,et al.  How and why people Twitter: the role that micro-blogging plays in informal communication at work , 2009, GROUP.

[7]  Fernando Diaz,et al.  Time is of the essence: improving recency ranking using Twitter data , 2010, WWW '10.

[8]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[9]  Emine Yilmaz,et al.  A statistical method for system evaluation using incomplete judgments , 2006, SIGIR.

[10]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[11]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[12]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[13]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[14]  Mónica Marrero,et al.  Crowdsourcing Preference Judgments for Evaluation of Music Similarity Tasks , 2010 .

[15]  Elizabeth F. Churchill,et al.  Logging the Search Self-Efficacy of Amazon Mechanical Turkers , 2010 .

[16]  Ben Carterette,et al.  An Analysis of Assessor Behavior in Crowdsourced Preference Judgments , 2010 .

[17]  Omar Alonso,et al.  Detecting Uninteresting Content in Text Streams , 2010 .

[18]  Iadh Ounis,et al.  Crowdsourcing a News Query Classification Dataset , 2010 .

[19]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[20]  Mohammad Soleymani,et al.  Crowdsourcing for Affective Annotation of Video: Development of a Viewer-reported Boredom Corpus , 2010 .

[21]  John Le,et al.  Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution , 2010 .

[22]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.