Recent Advances in Computational Linguistics

ive summarization approaches use information extraction, ontological information, information fusion, and compression. Automatically generated abstracts (abstractive summaries) moves the summarization field from the use of purely extractive methods to the generation of abstracts that contain sentences not found in any of the input documents and can synthesize information across sources. An abstract contains at least some sentences (or phrases) that do not exist in the original document. Of course, true abstraction involves taking the process one step further. Abstraction involves recognizing that a set of extracted passages together constitute something new, something that is not explicitly mentioned in the source, and then replacing them in the summary with the new concepts. The requirement that the new material not be in the text explicitly means that the system must have access to external information of some kind, such as an ontology or a knowledge base, and be able to perform combinatory inference. Recently, Ledeneva et al. [Led08a, Led08b, Led08c] and Garcia et al. [Gar08a, Gar08b, Gar09] have successfully employed the word sequences from the selftext for detecting the candidate text fragments for composing the summary. Ledeneva et al. [Led08a] suggest a typical automatic extractive summarization approach composed by term selection, term weighting, sentence weighting and sentence selection steps. One of the ways to select the appropriate sentences is to assign some numerical measure of usefulness of a sentence for the summary and then select the best ones; the process of assigning these usefulness weights is called sentence weighting. One of the ways to estimate the usefulness of a sentence is to sum up usefulness weights of individual terms of which the sentence consists; the process of estimating the individual terms is called term weighting. For this, one should decide what the terms are: for example, they can be words; deciding what objects will count as terms is the task of term selection. Different extractive summarization methods can be characterized by how they perform these tasks. Ledeneva et al. [Led08a, Led08b, Led08c] has proposed to extract all the frequent grams from the selftext, but she only considers those that are not contained (as subsequence) in other frequent grams (maximal frequent word sequences). In comparison with n-grams, the Maximal Frequent Sequences (MFS) are attractive for extractive text summarization since it is not necessary to define the gram size (n), it means, the length of each RECENT ADVANCES IN CL Informatica 34 (2010) 501–517 511 MFS is determined by the self-text. Moreover, the set of all extracted MFSs is a compact representation all frequent word sequences, reducing in this way the dimensionality in a vector space model. Garcia et al. [Gar08b, Gar09] have extracted all the sequences of n words (n-grams) from the self-text as features of its model. In this work, we evaluate the ngrams and maximal frequent sequences as domainand languageindependent models for automatic text summarization. In this work, sentences were extracted using unsupervised learning approach. Some other methods are also developed for abstractive summarization. For example, techniques of sentence fusion [Dau04, Bar03, Bar05], information fusion [Bar99], sentence compression [Van04, Mad07], headline summarization [Sar05], etc. 4.3.3 Recent applications of text

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Stephen Wan,et al.  Generating Overview Summaries of Ongoing Email Thread Discussions , 2004, COLING.

[3]  Daniel Marcu,et al.  Discourse-Based Summarization in DUC-2001 , 2001 .

[4]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[5]  Michael Gamon,et al.  Task-Focused Summarization of Email , 2004 .

[6]  M. Felisa Verdejo,et al.  Overview of the Answer Validation Exercise 2007 , 2006, CLEF.

[7]  José Francisco Martínez Trinidad,et al.  A Fast Algorithm to Find All the Maximal Frequent Sequences in a Text , 2004, CIARP.

[8]  Stephanie Elzer Schwartz,et al.  Information graphics: an untapped resource for digital libraries , 2006, SIGIR.

[9]  Alberto Téllez-Valero,et al.  Improving Question Answering by Combining Multiple Systems Via Answer Validation , 2008, CICLing.

[10]  Alexander F. Gelbukh,et al.  Detection and Correction of Malapropisms in Spanish by Means of Internet Search , 2005, TSD.

[11]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[12]  Hans van Halteren,et al.  Agreement in Human Factoid Annotation for Summarization Evaluation , 2004, LREC.

[13]  Dong-Hong Ji,et al.  Multi-document Summarization Based on BE-Vector Clustering , 2006, CICLing.

[14]  Chin-Yew Lin,et al.  Looking for a Few Good Metrics: Automatic Summarization Evaluation - How Many Samples Are Enough? , 2004, NTCIR.

[15]  Alexander F. Gelbukh,et al.  Approach to Construction of Automatic Morphological Analysis Systems for Inflective Languages with Little Effort , 2003, CICLing.

[16]  Anna Feldman Computational Linguistics: Models, Resources, Applications , 2006, Computational Linguistics.

[17]  Quan Zhou IS_SUM: A Multi-Document Summarizer based on Document Index Graphic and Lexical Chains , 2005 .

[18]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[19]  Robert P. Futrelle Handling Figures in Document Summarization , 2004 .

[20]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[21]  Kathleen F. McCoy,et al.  Extending Document Summarization to Information Graphics , 2004 .

[22]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[23]  Manuel Montes-y-Gómez,et al.  Enhancing Cross-Language Question Answering by Combining Multiple Question Translations , 2009, CICLing.

[24]  Manuel Montes-y-Gómez,et al.  Using N-Gram Models to Combine Query Translations in Cross-Language Question Answering , 2006, CICLing.

[25]  José Francisco Martínez Trinidad,et al.  A New Algorithm for Fast Discovery of Maximal Sequential Patterns in a Document Collection , 2006, CICLing.

[26]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[27]  Alexander F. Gelbukh,et al.  Terms Derived from Frequent Sequences for Extractive Text Summarization , 2008, CICLing.

[28]  Alexander Gelbukh,et al.  Evolutionary Approach to Natural Language Word Sense Disambiguation through Global Coherence Optimization , 2005 .

[29]  Guy Lapalme,et al.  Legal Text Summarization by Exploration of the Thematic Structure and Argumentative Roles , 2004 .

[30]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[31]  Geoffrey Sampson,et al.  The Oxford Handbook of Computational Linguistics , 2003, Lit. Linguistic Comput..

[32]  Igor A. Bolshakov,et al.  Getting One's First Million ...Collocations , 2004, CICLing.

[33]  Rada Mihalcea,et al.  Random Walks on Text Structures , 2006, CICLing.

[34]  Paolo Rosso,et al.  Clustering Abstracts of Scientific Texts Using the Transition Point Technique , 2006, CICLing.

[35]  Alexander F. Gelbukh,et al.  Various Criteria of Collocation Cohesion in Internet: Comparison of Resolving Power , 2008, CICLing.

[36]  Daniel Marcu,et al.  Generic Sentence Fusion is an Ill-Defined Summarization Task , 2004 .

[37]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[38]  Yulia Ledeneva,et al.  Word Sequence Models for Single Text Summarization , 2009, 2009 Second International Conferences on Advances in Computer-Human Interactions.

[39]  Yi Pan,et al.  Sentence Compression for Automated Subtitling: A Hybrid Approach , 2004, ACL 2004.

[40]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[41]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[42]  Axel-Cyrille Ngonga Ngomo,et al.  SIGNUM: A Graph Algorithm for Terminology Extraction , 2008, CICLing.

[43]  Michel Beigbeder,et al.  Hybrid Method for Personalized Search in Scientific Digital Libraries , 2008, CICLing.

[44]  Manuel Montes-y-Gómez,et al.  Using Word Sequences for Text Summarization , 2006, TSD.

[45]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[46]  Regina Barzilay,et al.  Sentence Fusion for Multidocument News Summarization , 2005, CL.

[47]  Igor A. Bolshakov,et al.  Internet , a True Friend of the Translator * , 2003 .

[48]  Dragomir R. Radev,et al.  News to go: hierarchical text summarization for mobile devices , 2006, SIGIR '06.

[49]  Yue-Shi Lee,et al.  Language Model Passage Retrieval for Question-Oriented Multi Document Summarization , 2007 .

[50]  Claire Grover,et al.  A Rhetorical Status Classifier for Legal Text Summarisation , 2004 .

[51]  Christian Biemann,et al.  Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems , 2006 .

[52]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[53]  Tsvi Kuflik,et al.  LAKE system at DUC-2006 , 2006 .

[54]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[55]  Yulia Ledeneva,et al.  Effect of Preprocessing on Extractive Summarization with Maximal Frequent Sequences , 2008, MICAI.

[56]  Regina Barzilay,et al.  Columbia’s Newsblaster: New Features and Future Directions , 2003, NAACL.

[57]  Sergio Ferrández,et al.  The Negative Effect of Machine Translation on Cross-Lingual Question Answering , 2007, CICLing.

[58]  Xiaojun Wan,et al.  Incorporating Cross-Document Relationships Between Sentences for Single Document Summarizations , 2006, ECDL.

[59]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[60]  Jing Li,et al.  A Query-Focused Multi-Document Summarizer Based on Lexical Chains , 2007 .

[61]  Yllias Chali,et al.  Text Summarization Using Lexical Chains , 2001 .

[62]  Jiulong Shan,et al.  A new web page summarization method , 2006, SIGIR '06.

[63]  Kathleen R. McKeown,et al.  Understanding the process of multi-document summarization: content selection, rewriting and evaluation , 2006 .

[64]  Roland Hausser,et al.  Three principled methods of automatic word form recognition , 1999 .

[65]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[66]  Jimmy J. Lin,et al.  Overview of the TREC 2007 Question Answering Track , 2008, TREC.

[67]  Radek Sedlácek,et al.  A New Czech Morphological Analyser ajka , 2001, TSD.

[68]  Nitin Madnani,et al.  Multiple Alternative Sentence Compressions for Automatic Text Summarization , 2007 .

[69]  Shingo Kuroiwa,et al.  A Question Answering System on Special Domain and the Implementation of Speech Interface , 2006, CICLing.

[70]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[71]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[72]  Hans van Halteren,et al.  Evaluating Information Content by Factoid Analysis: Human annotation and stability , 2004, EMNLP.

[73]  Sivaji Bandyopadhyay,et al.  Generating Headline Summary from a Document Set , 2005, CICLing.

[74]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[75]  Kathleen F. McCoy,et al.  Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization , 2002, CL.

[76]  Ani Nenkova,et al.  Automatically Learning Cognitive Status for Multi-Document Summarization of Newswire , 2005, HLT/EMNLP.

[77]  Hendrik T. Macedo,et al.  Innovative Approach for Engineering NLG Systems: The Content Determination Case Study , 2008, CICLing.

[78]  Leila Kosseim,et al.  A Little Known Fact Is ... Answering Other Questions Using Interest-Markers , 2009, CICLing.

[79]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .