SOPHIA in Enterprise Track

W3C collection contains documents of different types. In our experiments we used only two document types: www and lists. Examples of www documents are drafts and final versions of official W3C documents, slides from presentations given by W3C members and so on. Documents of lists type are e-mails. We split www documents into parts, based on 1000 word long segments and considered every part as a separate document. We didn’t split mails (lists type documents).