Metrics for information retrieval: A case study

The domain of information retrieval (IR)has used clustering methods in a big way. Clustering is a technique that groups a set of documents into clusters or subsets. How efficiently and effectively the relevant documents are extracted from World Wide Web is a challenging issue. In this work, we compare and analyse the effectiveness of similarity measures such as City Block distance, Cosine similarity, Point symmetry distance and Dicecoefficient to improve document clustering with and without the presence of ontology. This has two objectives: a comparison of metrics in the domain and study the impact of various methods like ontology comparison and clustering on the metrics as a whole. This will lead to further refinement of the metrics for current and future needs in the domain. Earlier works in the domain have highlighted the fact that the results of the similarity measures are more or less the same. However our work shows that the use of ontology based clustering marked changes in the results. The results show the need for more work to be focused on the metrics aspect in information retrieval. (5 pages)