Information Content Measures of Semantic Similarity Perform Better Without Sense-Tagged Text

This paper presents an empirical comparison of similarity measures for pairs of concepts based on Information Content. It shows that using modest amounts of untagged text to derive Information Content results in higher correlation with human similarity judgments than using the largest available corpus of manually annotated sense--tagged text.