A study of inter-annotator agreement for opinion retrieval
暂无分享,去创建一个
Evaluation of sentiment analysis, like large-scale IR evaluation, relies on the accuracy of human assessors to create judgments. Subjectivity in judgments is a problem for relevance assessment and even more so in the case of sentiment annotations. In this study we examine the degree to which assessors agree upon sentence-level sentiment annotation. We show that inter-assessor agreement is not contingent on document length or frequency of sentiment but correlates positively with automated opinion retrieval performance. We also examine the individual annotation categories to determine which categories pose most difficulty for annotators.
[1] Hsin-Hsi Chen,et al. Overview of Multilingual Opinion Analysis Task at NTCIR-7 , 2008, NTCIR.
[2] Klaus Krippendorff,et al. Answering the Call for a Standard Reliability Measure for Coding Data , 2007 .
[3] Claire Cardie,et al. Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.
[4] Alan F. Smeaton,et al. DCU at the TREC 2008 Blog Track , 2008, TREC.
[5] Iadh Ounis,et al. Overview of the TREC 2008 Blog Track , 2008, TREC.