Automatic multidocument summarization of research abstracts: Design and user evaluation

The purpose of this study was to develop a method for automatic construction of multi-document summaries of sets of research abstracts that may be retrieved by a digital library or search engine in response to a user query. Sociology dissertation abstracts were selected as the sample domain in this study. A variable-based framework was proposed for integrating and organizing research concepts and relationships as well as research methods and contextual relations extracted from different dissertation abstracts. Based on the framework, a new summarization method was developed, which parses the discourse structure of abstracts, extracts research concepts and relationships, integrates the information across different abstracts, and organizes and presents them in a Webbased interface. A user evaluation was performed to assess the overall quality and usefulness of the summaries. Two types of variable-based summaries generated using the summarization method – with or without the use of a taxonomy – were compared against a sentence-based summary that only lists the research objective sentences extracted from each abstract and another sentence-based summary generated using the MEAD system that extracts important sentences. The evaluation results indicated that the majority of sociological researchers (70%) and general user (64%) preferred the variable-based summaries generated with the use of the taxonomy.

[1]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[2]  Ou Shi-yan,et al.  Constructing a taxonomy to support multi-document summarization of dissertation abstracts , 2005 .

[3]  Jean-Luc Minel,et al.  How to Appreciate the Quality of Automatic Text Summarization? Examples of FAN and MLUCE Protocols and their Results on SERAPHIN , 1997, ACL 1997.

[4]  Kenneth Nyberg,et al.  Sociology , 2002, Encyclopedia of Information Systems.

[5]  Christopher S. G. Khoo,et al.  A Hierarchical Framework for Multi-document Summarization of Dissertation Abstracts , 2002, ICADL.

[6]  Elizabeth D. Liddy,et al.  Advances in Automatic Text Summarization , 2001, Information Retrieval.

[7]  Gustave J. Rath,et al.  The formation of abstracts by the selection of sentences , 1961 .

[8]  Inderjeet Mani,et al.  Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.

[9]  Yllias Chali,et al.  Summarization Techniques at DUC 2004 , 2004 .

[10]  Therese Firmin Hand,et al.  A Proposal for Task-based Evaluation of Text Summarization Systems , 1997, Workshop On Intelligent Scalable Text Summarization.

[11]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[12]  D. A. Kenny,et al.  The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. , 1986, Journal of personality and social psychology.

[13]  Claire Cardie,et al.  Multidocument Summarization via Information Extraction , 2001, HLT.

[14]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[15]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[16]  Yllias Chali,et al.  The University of Lethbridge Text Summarizer at DUC 2002 , 2002 .

[17]  L. A. Alemany Representing discourse for automatic text summarization via shallow nlp techinques , 2005 .

[18]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[19]  Zhu Zhang,et al.  Towards CST-enhanced summarization , 2002, AAAI/IAAI.

[20]  George A. Vouros,et al.  Methods and Applications of Artificial Intelligence , 2004, Lecture Notes in Computer Science.

[21]  William M. K. Trochim,et al.  Research methods knowledge base , 2001 .

[22]  Panagiotis Stamatopoulos,et al.  Summarization from Medical Documents: A Survey , 2005, Artif. Intell. Medicine.

[23]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[24]  John M. Conroy,et al.  Machine and human performance for single and multidocument summarization , 2003 .

[25]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[26]  Dragomir R. Radev,et al.  Generating summaries of multiple news articles , 1995, SIGIR '95.

[27]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[28]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[29]  Vangelis Karkaletsis,et al.  Exploiting Cross-Document Relations for Multi-document Evolving Summarization , 2004, SETN.

[30]  Wai Lam,et al.  Developing Infrastructure for the Evaluation of Single and Multi-document Summarization Systems in a Cross-lingual Environment , 2002, LREC.

[31]  Kathleen R. McKeown,et al.  Summarization Evaluation Methods: Experiments and Analysis , 1998 .

[32]  Tomek Strzalkowski,et al.  Cross-document summarization by concept classification , 2002, SIGIR '02.

[33]  Sanda M. Harabagiu,et al.  Generating Single and Multi-Document Summaries with GIST EXTER , 2002 .

[34]  Christopher S. G. Khoo,et al.  A Multi-document Summarization System for Sociology Dissertation Abstracts: Design, Implementation and Evaluation , 2005, ECDL.

[35]  Inderjeet Mani,et al.  Summarization Evaluation: An Overview , 2001, NTCIR.

[36]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[37]  René Witte,et al.  Multi-ERSS and ERSS 2004 , 2004 .

[38]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.