Complete Pre Processing Phase of Punjabi Text Extractive Summarization System

Text Summarization is condensing the source text into shorter form and retaining its information content and overall meaning. Punjabi text Summarization system is text extraction based summarization system which is used to summarize the Punjabi text by retaining the relevant sentences based on statistical and linguistic features of text. It comprises of two main phases: 1) Pre Processing 2) Processing. Pre Processing is structured representation of the original Punjabi text. In Processing, final score of each sentence is determined using feature-weight equation. Top ranked sentences in proper order are selected for final summary. This paper concentrates on complete pre processing phase of Punjabi text summarization system. Pre processing phase includes Punjabi words boundary identification, Punjabi sentences boundary identification, Punjabi stop words elimination, Punjabi language stemmer for nouns and proper names, allowing input in proper format and elimination of duplicate sentences.

[1]  K. Kaikhah Automatic text summarization with neural networks , 2004, 2004 2nd International IEEE Conference on 'Intelligent Systems'. Proceedings (IEEE Cat. No.04EX791).

[2]  Gurpreet Singh Lehal,et al.  Punjabi Language Stemmer for nouns and proper names , 2011 .

[3]  Esfandiar Eslami,et al.  Optimizing Text Summarization Based on Fuzzy Logic , 2008, Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008).

[4]  Gurpreet Singh Lehal,et al.  A Survey of Text Summarization Extractive Techniques , 2010 .

[5]  P. Kumar,et al.  A Hindi Question Answering system for E-learning documents , 2005, 2005 3rd International Conference on Intelligent Sensing and Information Processing.

[6]  Gurpreet Singh Lehal,et al.  Preprocessing Phase of Punjabi Language Text Summarization , 2011, ICIS 2011.

[7]  Md. Zahurul Islam,et al.  A light weight stemmer for Bengali and its use in spelling checker , 2007 .

[8]  Alex A. Freitas,et al.  Document Clustering and Text Summarization , 2000 .