Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization

Textual data in electronic documents today around the world have no doubt brought forward all the information one could need and as data banks build up worldwide, and access gets easier through technology, it has become easier to overlook vital facts and figures that could bring about groundbreaking discoveries. This research paper discusses in detail an implementation of Information Extraction and Categorization in the text mining application that we have implemented. To extract terms from the document we have used modified version of Porter’s Algorithm for inflectional stemming. For calculating term frequencies for categorization, we have used a domain dictionary for ‘Computer Science’ domain.