An analysis of human factors and label accuracy in crowdsourcing relevance judgments