Towards Automatic Generation of Gene Summary

In this paper we present an extractive system that automatically generates gene summaries from the biomedical literature. The proposed text summarization system selects and ranks sentences from multiple MEDLINE abstracts by exploiting gene-specific information and similarity relationships between sentences. We evaluate our system on a large dataset of 7,294 human genes and 187,628 MEDLINE abstracts using Recall-Oriented Understudy for Gisting Evaluation (ROUGE), a widely used automatic evaluation metric in the text summarization community. Two baseline methods are used for comparison. Experimental results show that our system significantly outperforms the other two methods with regard to all ROUGE metrics. A demo website of our system is freely accessible at http://60.195.250.72/onbires/summary.jsp.

[1]  Simone Teufel,et al.  Sentence extraction as a classification task , 1997 .

[2]  Martijn J. Schuemie,et al.  Searching for geneRIFs: Concept-Based Query Expansion and Bayes Classification , 2003, TREC.

[3]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[4]  William R. Hersh,et al.  Evaluation of a gene information summarization system by users during the analysis process of microarray datasets , 2009, BMC Bioinformatics.

[5]  K. Bretonnel Cohen,et al.  Manual curation is not sufficient for annotation of genomic databases , 2007, ISMB/ECCB.

[6]  Xin He,et al.  Generating gene summaries from biomedical literature: A study of semi-structured summarization , 2007, Inf. Process. Manag..

[7]  William R. Hersh,et al.  Automatic Summarization of Mouse Gene Information by Clustering and Sentence Extraction from MEDLINE Abstracts , 2007, AMIA.

[8]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[9]  Xin He,et al.  Automatically Generating Gene Summaries from Biomedical Literature , 2005, Pacific Symposium on Biocomputing.

[10]  Ani Nenkova,et al.  Measuring Importance and Query Relevance in Topic-focused Multi-document Summarization , 2007, ACL.

[11]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[12]  Dragomir R. Radev,et al.  Generating summaries of multiple news articles , 1995, SIGIR '95.

[13]  Zhiyong Lu,et al.  Finding GeneRIFs via Gene Ontology Annotations , 2005, Pacific Symposium on Biocomputing.

[14]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[15]  Xiaojun Wan,et al.  Towards a Unified Approach Based on Affinity Graph to Various Multi-document Summarizations , 2007, ECDL.

[16]  Hua Li,et al.  Improving web search results using affinity graph , 2005, SIGIR '05.

[17]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[18]  Branimir K. Boguraev,et al.  Salience-based Content Characterisafion of Text Documents , 1997 .

[19]  Preslav Nakov,et al.  BioText Team Report for the TREC 2003 Genomics Track , 2003, TREC.

[20]  Andrew Hickl,et al.  LCC's GISTexter at DUC 2007: Machine Reading for Update Summarization , 2007 .

[21]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[22]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[23]  William R. Hersh,et al.  TREC GENOMICS Track Overview , 2003, TREC.

[24]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[25]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[26]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[27]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[28]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .