A behavioural mode research on user-focus summarization

Different persons often choose different contents in multi-document as summary. To optimize summarization, we will focus on the selection of content and seeking their valuable features. Statistical methods for automatic summarization are very important. In this paper, we research the correlation between the eigenvalue of content unit in the original document cluster and the probability of the content unit to be selected as a human summary based on a statistical method. When a Basic Element and word are considered as a content unit, we draw conclusions, in user-focus summarization. It is excellent that the BE is regarded as content unit granularity, and it is proved that the frequency eigenvalue of the BE is more suitable to embody content units' weightiness than the TFIDF value. Moreover, the paper reveals that the given topic on user-focus summarization is helpful for the selection of content unit and quality of summarization. They often choose those content units as a summary in which the emerging frequency is relatively high in the sentences including the content unit of a given topic and neighboring sentences. Through researching potential behavioural modes about manual summary, we will put these effect factors of summarization quality into the process of content unit selection and summary generation to optimize automatic summarization.

[1]  Ani Nenkova,et al.  Syntactic Simplification for Improving Content Selection in Multi-Document Summarization , 2004, COLING.

[2]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[3]  Kishore Papineni,et al.  Why Inverse Document Frequency? , 2001, NAACL.

[4]  Naixue Xiong,et al.  Design and Analysis of a Self-Tuning Proportional and Integral Controller for Active Queue Management Routers to Support TCP Flows , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[5]  Inderjeet Mani,et al.  Machine Learning of Generic and User-Focused Summarization , 1998, AAAI/IAAI.

[6]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[7]  Subbarao Kambhampati,et al.  Frequency-Based Coverage Statistics Mining for Data Integration , 2003, IIWeb.

[8]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[9]  Terry Copeck,et al.  Vocabulary Agreement Among Model Summaries And Source Documents 1 , 2004 .

[10]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[11]  Kathleen R. McKeown,et al.  Understanding the process of multi-document summarization: content selection, rewriting and evaluation , 2006 .

[12]  Hongyan Jing,et al.  Sentence Reduction for Automatic Text Summarization , 2000, ANLP.

[13]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[14]  Eduard Hovy,et al.  Evaluating DUC 2005 using Basic Elements , 2005 .

[15]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[16]  Kathleen R. McKeown,et al.  Columbia multi-document summarization : Approach and evaluation , 2001 .

[17]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[18]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[19]  Kathleen McKeown,et al.  Cut and Paste Based Text Summarization , 2000, ANLP.

[20]  Redmond,et al.  Using N-Grams to Understand the Nature of Summaries , 2004 .

[21]  Gustave J. Rath,et al.  The formation of abstracts by the selection of sentences , 1961 .

[22]  G. Bowden Wise,et al.  Multi-Document Summarization: Methodologies and Evaluations , 2000 .

[23]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.