论文信息 - Text analysis and information retrieval of text data

Text analysis and information retrieval of text data

Text summarization combines the process of POS tagging, term frequency and topical analysis. All these together are used to produce insightful summary of the document/documents. The concise version of the text document can be made using the concept of frequency of the terms and inverse frequency of documents. Text summarization is useful for bring the short story of all the newspaper articles, email correspondence or to extract key elements for the search engine. To compact the size, the sentences which are not near to the centroid is not to be considered in the output. To do that, the data which does not relate to the centroid topic has to be pruned. The output consists of only important data useful to the user. Large unstructured data can be converted in such form that can be used for report making, compacting of web pages and review of the book. In this, the summary from documents contains significant information, and is less than half of the original size. The output should be such that it fully satisfies the user's query and understands the answer given to it.

Sheetal Chaudhari | Honey Gupta | Aveena Kottwani | Soniya Gogia

[1] Marco Furini,et al. International Journal of Computer and Applications , 2010 .

[2] T. Martin McGinnity,et al. A Context-Based Word Indexing Model for Document Summarization , 2013, IEEE Transactions on Knowledge and Data Engineering.

[3] Adam Kilgarriff,et al. A Corpus Factory for Many Languages , 2010, LREC.

[4] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[5] Jiawei Han,et al. Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions , 2010, COLING.

[6] Shraddha Deshpande,et al. Sentiment Analysis Tool using Cosine and Jaccard Implementation , 2015 .

[7] K. Nirmala,et al. Quantification of Portrayal Concepts using tf-idf Weighting , 2013 .

[8] Mohsen Pourvali,et al. A new graph based text segmentation using Wikipedia for automatic text summarization , 2012 .

[9] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .