Summarization of texts found on the world wide web

Summaries of texts found on the World Wide Web are valuable. They help the user of a search engine to select information and are an aid for processing the vast amount of information found on the Web. This chapter describes the technologies that can be applied for summarizing the texts of Web pages. The focus is on technologies that currently generate the best results and are suited for the specific heterogeneous environment that makes up the World Wide Web. This chapter gives an overview of generic, query-biased and task-specific summarization, as well as single-document and multi-document summarization. Among the technologies that are discussed are semantic frame technologies, rhetorical structure analysis, learning discourse patterns, techniques relying upon lexical cohesion, and text clustering.

[1]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[2]  Roger C. Schank,et al.  Conceptual dependency: A theory of natural language understanding , 1972 .

[3]  Walter Kintsch,et al.  Toward a model of text comprehension and production. , 1978 .

[4]  Marie-Francine Moens,et al.  Semantic Case Role Detection for Information Extraction , 2002, COLING.

[5]  C. Fillmore FRAME SEMANTICS AND THE NATURE OF LANGUAGE * , 1976 .

[6]  Eduard Hovy,et al.  NeATS in DUC 2002 , 2002 .

[7]  Douglas E. Appelt,et al.  FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text , 1997, ArXiv.

[8]  Ellen Riloff,et al.  An Empirical Study of Automated Dictionary Construction for Information Extraction in Three Domains , 1996, Artif. Intell..

[9]  Karen Spärck Jones What Might be in a Summary? , 1993, Information Retrieval.

[10]  G Salton,et al.  Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts , 1994, Science.

[11]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[12]  Wessel Kraaij,et al.  Headline extraction based on a combination of uni- and multidocument summarization techniques , 2002 .

[13]  Dragomir R. Radev,et al.  Generating summaries of multiple news articles , 1995, SIGIR '95.

[14]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[15]  Regina Barzilay,et al.  Sentence Ordering in Multidocument Summarization , 2001, HLT.

[16]  Marie-Francine Moens,et al.  Automatic Indexing and Abstracting of Document Texts , 2000, Computational Linguistics.

[17]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[18]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[19]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[20]  Wendy G. Lehnert,et al.  Strategies for Natural Language Processing , 1982 .

[21]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[22]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .

[23]  Udo Hahn,et al.  Topic parsing: Accounting for text macro structures in full-text analysis , 1990, Inf. Process. Manag..

[24]  Marie-Francine Moens,et al.  The use of topic segmentation for automatic summarization , 2002, ACL 2002.

[25]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[26]  Sanda M. Harabagiu,et al.  Acquisition of Linguistic Patterns for Knowledge-based Information Extraction , 2000, LREC.

[27]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[28]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[29]  Eduard H. Hovy,et al.  Automated Discourse Generation Using Discourse Structure Relations , 1993, Artif. Intell..

[30]  Dan I. Moldovan,et al.  Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction , 1995, IEEE Trans. Knowl. Data Eng..

[31]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[32]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[33]  Eleazar Eskin,et al.  Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning , 1999, EMNLP.

[34]  H. van Halteren Writing Style Recognition and Sentence Extraction , 2002, ACL 2002.

[35]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[36]  Tomek Strzalkowski,et al.  Cross-document summarization by concept classification , 2002, SIGIR '02.

[37]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[38]  Marie-Francine Moens,et al.  Generic topic segmentation of document texts , 2001, SIGIR '01.