SOPHIA in Enterprise Track
暂无分享,去创建一个
W3C collection contains documents of different types. In our experiments we used only two document types: www and lists. Examples of www documents are drafts and final versions of official W3C documents, slides from presentations given by W3C members and so on. Documents of lists type are e-mails. We split www documents into parts, based on 1000 word long segments and considered every part as a separate document. We didn’t split mails (lists type documents).
[1] Mykola Galushka,et al. A scaleable document clustering approach for large document corpora , 2006, Inf. Process. Manag..
[2] David W. Patterson,et al. Contextual Document Clustering , 2004, ECIR.
[3] Mykola Galushka,et al. A relevance feedback mechanism for cluster-based retrieval , 2006, Inf. Process. Manag..