Text analysis and information retrieval of text data

Text summarization combines the process of POS tagging, term frequency and topical analysis. All these together are used to produce insightful summary of the document/documents. The concise version of the text document can be made using the concept of frequency of the terms and inverse frequency of documents. Text summarization is useful for bring the short story of all the newspaper articles, email correspondence or to extract key elements for the search engine. To compact the size, the sentences which are not near to the centroid is not to be considered in the output. To do that, the data which does not relate to the centroid topic has to be pruned. The output consists of only important data useful to the user. Large unstructured data can be converted in such form that can be used for report making, compacting of web pages and review of the book. In this, the summary from documents contains significant information, and is less than half of the original size. The output should be such that it fully satisfies the user's query and understands the answer given to it.