Automatic summarising and the CLASP system

This dissertation discusses summarisers and summarising in general, and presents a new summarising system, clasp. In chapters 1−3, I present a framework for thinking about summarisers in terms of context factors and the three stages of analysis, condensation and synthesis. I look at previous research in automatic summarising and identify four main directions that have been taken. I consider how summarising systems may be and have been evaluated. clasp, described in chapters 4−7, takes a new approach based on a shallow semantic representation of the source text as a predication cohesion graph. Nodes in the graph are simple predications corresponding to events, states and entities mentioned in the text; edges indicate related or similar nodes. Summary content is chosen by selecting some of these predications according to criteria of importance, representativeness and cohesiveness. These criteria are expressed as functions on the nodes of a weighted graph. Summary text is produced either by extracting whole sentences from the source text, or by generating short, indicative summary phrases from the selected predications. clasp uses linguistic processing but no domain knowledge, and therefore does not restrict the subject matter of the source text. It is intended to deal robustly with complex texts that it cannot analyse completely accurately or in full. Chapter 8 describes experiments in summarising stories from the Wall Street Journal. The results suggest that there may be a benefit in identifying important material in a semantic representation rather than a surface one, but that, despite the robustness of the source representation, inaccuracies in clasp’s linguistic analysis can dramatically affect the readability of its summaries. In chapter 9, I suggest ways in which clasp could be modified to overcome this and other problems.

[1]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[2]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[3]  Therese Firmin Hand,et al.  A Proposal for Task-based Evaluation of Text Summarization Systems , 1997, Workshop On Intelligent Scalable Text Summarization.

[4]  Walter Kintsch,et al.  Toward a model of text comprehension and production. , 1978 .

[5]  Seiji Miike,et al.  A full-text retrieval system with a dynamic abstract generation function , 1994, SIGIR '94.

[6]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[7]  Frances C. Johnson,et al.  The application of linguistic processing to automatic abstract generation , 1997 .

[8]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[9]  Donna Harman The First Text REtrieval Conference (TREC-1) | NIST , 1993 .

[10]  W. Kintsch,et al.  Strategies of discourse comprehension , 1983 .

[11]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[12]  Mary Ellen Okurowski,et al.  A Scalable Summarization System Using Robust NLP , 1997 .

[13]  Chris Buckley,et al.  Automatic Text Summarization by Paragraph Extraction , 1997 .

[14]  Stephen Leonard Taylor Automatic abstracting by applying graphical techniques to semantic networks. , 1974 .

[15]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[16]  Margaret King,et al.  Evaluating natural language processing systems , 1996, CACM.

[17]  David Allport,et al.  The TIC: Parsing Interesting Text , 1988, ANLP.

[18]  Stephen Pulman,et al.  Shallow processing and automatic summarising: a first study , 1991 .

[19]  Roberta H. Merchant TIPSTER Program Overview , 1993, TIPSTER.

[20]  G Salton,et al.  Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts , 1994, Science.

[21]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[22]  Michael Halliday,et al.  Cohesion in English , 1976 .

[23]  Udo Hahn,et al.  Topic parsing: Accounting for text macro structures in full-text analysis , 1990, Inf. Process. Manag..

[24]  H. Alshawi,et al.  The Core Language Engine , 1994 .

[25]  Lisa F. Rau,et al.  Information extraction and text summarization using linguistic knowledge acquisition , 1989, Inf. Process. Manag..

[26]  H. J. Zimmermann,et al.  Electronic circuits, signals, and systems , 1960 .

[27]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[28]  George M. Kasper,et al.  The Effects and Limitations of Automated Text Condensing on Reading Comprehension Performance , 1992, Inf. Syst. Res..

[29]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[30]  C. V. Ramamoorthy,et al.  Analysis of Graphs by Connectivity Considerations , 1966, J. ACM.

[31]  Jirí Jonos Theory of functional sentence perspective and its application for the purposes of automatic extracting , 1979, Inf. Process. Manag..

[32]  Chris D. Paice,et al.  The identification of important concepts in highly structured technical papers , 1993, SIGIR.

[33]  Candace L. Sidner,et al.  Focusing in the comprehension of definite anaphora , 1986 .

[34]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[35]  Lisa F. Rau,et al.  Automatic Condensation of Electronic Publications by Sentence Selection , 1995, Inf. Process. Manag..

[36]  Antonio Zamora,et al.  Automatic Abstracting Research at Chemical Abstracts Service , 1975, J. Chem. Inf. Comput. Sci..

[37]  James Allan,et al.  Selective text utilization and text traversal , 1993, Int. J. Hum. Comput. Stud..

[38]  Gustave J. Rath,et al.  The formation of abstracts by the selection of sentences , 1961 .

[39]  Elizabeth Du,et al.  The discourse-level structure of empirical abstracts: an exploratory study , 1991, Inf. Process. Manag..

[40]  Lois L. Earl,et al.  Experiments in automatic extracting and indexing , 1970, Inf. Storage Retr..

[41]  William C. Mann,et al.  Rhetorical Structure Theory: Description and Construction of Text Structures , 1987 .

[42]  Daniel Marcu,et al.  From discourse structures to text summaries , 1997 .

[43]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[44]  Ii Gerald Francis Dejong Skimming stories in real time: an experiment in integrated understanding. , 1979 .

[45]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[46]  Roger C. Schank,et al.  SCRIPTS, PLANS, GOALS, AND UNDERSTANDING , 1988 .

[47]  Karen Spärck Jones What Might be in a Summary? , 1993, Information Retrieval.

[48]  Sheryl R. Young,et al.  Automatic Classification and Summarization of Banking Telexes , 1985, CAIA.

[49]  E. F. Skorochod'ko Adaptive Method of Automatic Abstracting and Indexing , 1971, IFIP Congress.

[50]  Chris D. Paice,et al.  The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases , 1980, SIGIR '80.

[51]  James E. Rush,et al.  Automatic abstracting and indexing. II. Production of indicative abstracts by application of contextual inference and syntactic coherence criteria , 1971 .