A Study on the Architecture for Text Categorization and Summarization

Most of the search techniques just check for the availability of the word in the document. But they never go through whether the presence of the word is meaning full or not. Text categorization is the process in which a given document or documents are searched through. And text summarization is the process in which the given documents subjectivity is found. Combining these two a meaningful search technique can be provided. In this paper, an architecture which provides the searching techniques by combining text categorization and text summarization for documents searching is proposed. Term Frequency and Inverse Document Frequency (TFIDF) style equation combining with various machine learning techniques are used for text categorization and text summarization. Here “n” documents are considered and are searched. Search results are displayed along with the subjectivity of the document, so that get the searched documents along with their subjectivity and fastly identify his wanted document. Keywords—Machine learning, text categorization, text summarization, TFIDF style equation.