Automatic multidocument summarization of research abstracts: Design and user evaluation

The purpose of this study was to develop a method for automatic construction of multidocument summaries of sets of research abstracts that may be retrieved by a digital library or search engine in response to a user query. Sociology dissertation abstracts were selected as the sample domain in this study. A variable-based framework was proposed for integrating and organizing research concepts and relationships as well as research methods and contextual relations extracted from different dissertation abstracts. Based on the framework, a new summarization method was developed, which parses the discourse structure of abstracts, extracts research concepts and relationships, integrates the information across different abstracts, and organizes and presents them in a Web-based interface. The focus of this article is on the user evaluation that was performed to assess the overall quality and usefulness of the summaries. Two types of variable-based summaries generated using the summarization methodwith or without the use of a taxonomywere compared against a sentence-based summary that lists only the research-objective sentences extracted from each abstract and another sentence-based summary generated using the MEAD system that extracts important sentences. The evaluation results indicate that the majority of sociological researchers (70%) and general users (64%) preferred the variable-based summaries generated with the use of the taxonomy. © 2007 Wiley Periodicals, Inc.

[1]  John M. Conroy,et al.  Machine and human performance for single and multidocument summarization , 2003 .

[2]  Zhu Zhang,et al.  Towards CST-enhanced summarization , 2002, AAAI/IAAI.

[3]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[4]  Christopher S. G. Khoo,et al.  A Hierarchical Framework for Multi-document Summarization of Dissertation Abstracts , 2002, ICADL.

[5]  Gustave J. Rath,et al.  The formation of abstracts by the selection of sentences , 1961 .

[6]  Therese Firmin Hand,et al.  A Proposal for Task-based Evaluation of Text Summarization Systems , 1997, Workshop On Intelligent Scalable Text Summarization.

[7]  Panagiotis Stamatopoulos,et al.  Summarization from Medical Documents: A Survey , 2005, Artif. Intell. Medicine.

[8]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[9]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[10]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[11]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[12]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[13]  William M. K. Trochim,et al.  Research methods knowledge base , 2001 .

[14]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[15]  Joseph M. Moxley American Universities Should Require Electronic Theses and Dissertations , 2001 .

[16]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[17]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[18]  Inderjeet Mani,et al.  Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.

[19]  Ou Shi-yan,et al.  Constructing a taxonomy to support multi-document summarization of dissertation abstracts , 2005 .

[20]  D. A. Kenny,et al.  The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. , 1986, Journal of personality and social psychology.

[21]  Vangelis Karkaletsis,et al.  Exploiting Cross-Document Relations for Multi-document Evolving Summarization , 2004, SETN.

[22]  Claire Cardie,et al.  Multidocument Summarization via Information Extraction , 2001, HLT.

[23]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[24]  Dragomir R. Radev,et al.  Generating summaries of multiple news articles , 1995, SIGIR '95.

[25]  Wai Lam,et al.  Developing Infrastructure for the Evaluation of Single and Multi-document Summarization Systems in a Cross-lingual Environment , 2002, LREC.

[26]  Tomek Strzalkowski,et al.  Cross-document summarization by concept classification , 2002, SIGIR '02.

[27]  Christopher S. G. Khoo,et al.  A Multi-document Summarization System for Sociology Dissertation Abstracts: Design, Implementation and Evaluation , 2005, ECDL.

[28]  Jean-Luc Minel,et al.  How to Appreciate the Quality of Automatic Text Summarization? Examples of FAN and MLUCE Protocols and their Results on SERAPHIN , 1997, ACL 1997.

[29]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[30]  Yllias Chali,et al.  The University of Lethbridge Text Summarizer at DUC 2002 , 2002 .

[31]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[32]  René Witte,et al.  Multi-ERSS and ERSS 2004 , 2004 .

[33]  Inderjeet Mani,et al.  Summarization Evaluation: An Overview , 2001, NTCIR.