Crowdsourcing Document Relevance Assessment with Mechanical Turk

We investigate human factors involved in designing effective Human Intelligence Tasks (HITs) for Amazon's Mechanical Turk. In particular, we assess document relevance to search queries via MTurk in order to evaluate search engine accuracy. Our study varies four human factors and measures resulting experimental outcomes of cost, time, and accuracy of the assessments. While results are largely inconclusive, we identify important obstacles encountered, lessons learned, related work, and interesting ideas for future investigation. Experimental data is also made publicly available for further study by the community.