Text mining in 'Request for Comments Document Series'

This paper discusses the knowledge discovery in text (KDT) system for the 'Request for Comments (RFC) Document Series'. The paper proposes a versatile system architecture for text mining in RFC that maintains structured and unstructured data components of the document. The documents are represented by keywords and knowledge discovery is performed by analysing the co-occurrence frequencies of the various keywords representing the document. The clustering of documents is done by extracted knowledge, which can reduce the search space. The relevant documents retrieved during the search process for a query are ranked based on relevance of the topic in it. This paper describes RFC Viewer, our tool for viewing the RFC document in rich text format rather than text format, which also provides knowledge extracted from the RFC document and supports various KDD operations on the document.