Assessing System Agreement and Instance Difficulty in the Lexical

This paper presents a comparative evaluation among the systems that participated in the Spanish and English lexical sample tasks of SENSEVAL-2. The focus is on pairwise comparisons among systems to assess the degree to which they agree, and on measuring the difficulty of the test instances included in these tasks.