Summarizing electronic discourse

The explosion of available information on the Internet has fueled the demand for automatic methods of text summarization. Existing approaches have primarily focused on abstracting documents such as news articles or technical papers. In this article, we examine how to create summaries of on-line asynchronous communication, in particular, discussion groups. First, we provide background on the nature of discussions as informal communication, and then we give a short history of computer conferencing and discussion systems. We then explain our approach to the problem and a set of observations and experiments we have done, putting our work in the context of research on automatic text summarization. We then describe a hierarchical discourse summarization algorithm and its implementation in system called Interactive Discussion Summarizer (IDS). We close with discussion and conclusions. Copyright © 2002 John Wiley & Sons, Ltd.

[1]  Eduard H. Hovy Parsimonious and Profligate Approaches to the Question of Discourse Structure Relations , 1990, INLG.

[2]  R. Slavin Cooperative Learning: Theory, Research and Practice , 1990 .

[3]  Wendy A. Kellogg,et al.  Socially translucent systems: social proxies, persistent conversation, and the design of “babble” , 1999, CHI '99.

[4]  John S. Quarterman,et al.  The Matrix: Computer Networks and Conferencing Systems Worldwide , 1989 .

[5]  Ann L. Brown,et al.  Guided discovery in a community of learners. , 1994 .

[6]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[7]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1991, CACM.

[8]  Susan E. Newman,et al.  Cognitive Apprenticeship: Teaching the Craft of Reading, Writing, and Mathematics. Technical Report No. 403. , 1987 .

[9]  Norbert Reithinger,et al.  Summarizing Multilingual Spoken Negotiation Dialogues , 2000, ACL.

[10]  Richard C. Anderson,et al.  On asking people questions about what they are reading , 1975 .

[11]  Karen E. Lochbaum,et al.  Using collaborative plans to model the intentional structure of discourse , 1995 .

[12]  Danah Boyd,et al.  Developing legible visualizations for online social spaces , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.

[13]  Seiji Miike,et al.  A full-text retrieval system with a dynamic abstract generation function , 1994, SIGIR '94.

[14]  John Seely Brown,et al.  Book Reviews : The Social Life of Information By John Seely Brown and Paul Duguid. Boston: Harvard Business School Press, 2000. 320 pages , 2000 .

[15]  Seiji Miike,et al.  Abstract Generation Based on Rhetorical Structure Extraction , 1994, COLING.

[16]  Eduard H. Hovy,et al.  Automated Discourse Generation Using Discourse Structure Relations , 1993, Artif. Intell..

[17]  P. Kollock,et al.  Communities in Cyberspace , 2002 .

[18]  Judith Donath,et al.  Identity and deception in the virtual community , 1998 .

[19]  C. Sidner,et al.  Plans for Discourse , 1988 .

[20]  Charles M. Savage Fifth generation management : co-creating through virtual enterprising, dynamic teaming and knowledge networking , 1996 .

[21]  James Shaw,et al.  Practical Issues in Automatic Documentation Generation , 1994, ANLP.

[22]  James E. Rush,et al.  Automatic abstracting and indexing. II. Production of indicative abstracts by application of contextual inference and syntactic coherence criteria , 1971 .

[23]  Judith S. Donath,et al.  A semantic approach to visualizing online conversations , 2002, CACM.

[24]  Cécile Paris,et al.  Automatically summarising Web sites: is there a way around it? , 2000, CIKM '00.

[25]  Daniel Marcu,et al.  From discourse structures to text summaries , 1997 .

[26]  D. Ausubel,et al.  In Defense of Advance Organizers: A Reply to the Critics* , 1978 .

[27]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[28]  Karrie Karahalios,et al.  Visualizing Conversation , 1999, J. Comput. Mediat. Commun..

[29]  Kristina Höök,et al.  Social navigation: techniques for building more usable systems , 2000, INTR.

[30]  W. Michael Reed,et al.  Asynchronous learning networks and cognitive apprenticeship: a potential model for teaching complex problem-solving skills in corporate environments , 2000 .

[31]  J. Groenendijk,et al.  Coreference and modality , 1996 .

[32]  L. Polanyi A formal model of the structure of discourse , 1988 .

[33]  Klaus Zechner,et al.  Automatic generation of concise summaries of spoken dialogues in unrestricted domains , 2001, SIGIR '01.

[34]  G. W. Furnas,et al.  Generalized fisheye views , 1986, CHI '86.

[35]  Walter Kintsch,et al.  Toward a model of text comprehension and production. , 1978 .

[36]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[37]  Seán Slattery,et al.  Data Mining on Symbolic Knowledge Extracted from the Web , 2000 .

[38]  Michael Hoey,et al.  Patterns of Lexis In Text , 1991 .

[39]  David D. Lewis,et al.  Threading Electronic Mail - A Preliminary Study , 1997, Inf. Process. Manag..

[40]  Paul Hildreth,et al.  Communities of practice in the distributed international environment , 2000, J. Knowl. Manag..

[41]  Alan M. Lesgold,et al.  What Makes Peer Interaction Effective? Modeling Effective Communication in an Intelligent CSCL , 1999 .

[42]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[43]  Margaret Riel,et al.  Research Perspectives on Network Learning , 1994 .

[44]  Ii Gerald Francis Dejong Skimming stories in real time: an experiment in integrated understanding. , 1979 .

[45]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[46]  Branimir Boguraev,et al.  Lexical cohesion, discourse segmentation and document summarization , 2000, RIAO.

[47]  Antonio Zamora,et al.  Automatic Abstracting Research at Chemical Abstracts Service , 1975, J. Chem. Inf. Comput. Sci..

[48]  S. T. Dumais,et al.  Human factors and behavioral science: Statistical semantics: Analysis of the potential performance of key-word information systems , 1983, The Bell System Technical Journal.

[49]  Jay F. Nunamaker,et al.  Electronic meeting systems , 1991, CACM.

[50]  Susan C. Herring Interactional Coherence in CMC , 1999, J. Comput. Mediat. Commun..

[51]  H. Grice Logic and conversation , 1975 .

[52]  Diane J. Litman,et al.  Plan recognition and discourse analysis: an integrated approach for understanding dialogues , 1986 .

[53]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[54]  Warren Sack,et al.  Design for very large-scale conversations , 2000 .

[55]  George M. Kasper,et al.  The Effects and Limitations of Automated Text Condensing on Reading Comprehension Performance , 1992, Inf. Syst. Res..

[56]  Kathleen McKeown,et al.  Cut and Paste Based Text Summarization , 2000, ANLP.

[57]  M. Scardamalia,et al.  Higher Levels of Agency for Children in Knowledge Building: A Challenge for the Design of New Knowledge Media , 1991 .

[58]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[59]  Bonnie B. Armbruster Does Text Structure/Summarization Instruction Facilitate Learning from Expository Text? Technical Report No. 394. , 1986 .

[60]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[61]  Robert G. Farrell,et al.  Online Mentoring: A Case Study Involving Cognitive Apprenticeship and a Technology-Enabled Learning Environment , 2000 .

[62]  Lev Vygotsky Mind in society , 1978 .

[63]  D. Rumelhart NOTES ON A SCHEMA FOR STORIES , 1975 .

[64]  Wendy A. Kellogg,et al.  Social translucence: an approach to designing systems that support social processes , 2000, TCHI.

[65]  Leon J. Osterweil,et al.  Software processes are software too , 1987, ISPW.

[66]  V. Dijk Recalling and Summarizing Complex Discourse , 1979 .

[67]  Kathleen McKeown,et al.  Text generation: using discourse strategies and focus constraints to generate natural language text , 1985 .

[68]  E. F. Skorochod'ko Adaptive Method of Automatic Abstracting and Indexing , 1971, IFIP Congress.

[69]  Etienne Wenger,et al.  Situated Learning: Legitimate Peripheral Participation , 1991 .

[70]  Steven L. Rohall,et al.  Email Visualizations to Aid Communications , 2001 .

[71]  J. Hobbs On the coherence and structure of discourse , 1985 .