论文信息 - Text summarization using a trainable summarizer and latent semantic analysis - 字舞流文

Text summarization using a trainable summarizer and latent semantic analysis

This paper proposes two approaches to address text summarization: modified corpus-based approach (MCBA) and LSA-based T.R.M. approach (LSA + T.R.M.). The first is a trainable summarizer, which takes into account several features, including position, positive keyword, negative keyword, centrality, and the resemblance to the title, to generate summaries. Two new ideas are exploited: (1) sentence positions are ranked to emphasize the significances of different sentence positions, and (2) the score function is trained by the genetic algorithm (GA) to obtain a suitable combination of feature weights. The second uses latent semantic analysis (LSA) to derive the semantic matrix of a document or a corpus and uses semantic sentence representation to construct a semantic text relationship map. We evaluate LSA + T.R.M. both with single documents and at the corpus level to investigate the competence of LSA in text summarization. The two novel approaches were measured at several compression rates on a data corpus composed of 100 political articles. When the compression rate was 30%, an average f-measure of 49% for MCBA, 52% for MCBA + GA, 44% and 40% for LSA + T.R.M. in single-document and corpus level were achieved respectively.

Wei-Pang Yang | Hao-Ren Ke | I-Heng Meng | Jen-Yuan Yeh | Wei-Pang Yang | Hao-Ren Ke | Jen-Yuan Yeh | I. Meng

[1] Maosong Sun,et al. Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data , 1998, ACL.

[2] Dragomir R. Radev,et al. Generating summaries of multiple news articles , 1995, SIGIR '95.

[3] Francine Chen,et al. A trainable document summarizer , 1995, SIGIR '95.

[4] Gerard Salton,et al. Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[5] Petra Perner,et al. Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[6] Eduard H. Hovy,et al. Automated Text Summarization and the SUMMARIST System , 1998, TIPSTER.

[7] Jerome R. Bellegarda,et al. A novel word clustering algorithm based on latent semantic analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8] H. P. Edmundson,et al. New Methods in Automatic Extracting , 1969, JACM.

[9] Wei-Pang Yang,et al. Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis , 2002, ICADL.

[10] Regina Barzilay,et al. Using Lexical Chains for Text Summarization , 1997 .

[11] Jade Goldstein-Stewart,et al. Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[12] Xin Liu,et al. Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[13] Jae-Hoon Kim,et al. Korean text summarization using an aggregate similarity , 2000, IRAL '00.

[14] Hans Peter Luhn,et al. The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[15] Inderjeet Mani,et al. The Challenges of Automatic Summarization , 2000, Computer.

[16] Mary Ellen Okurowski,et al. A Scalable Summarization System Using Robust NLP , 1997 .

[17] Chris Buckley,et al. Automatic Text Summarization by Paragraph Extraction , 1997 .

[18] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[19] Inderjeet Mani,et al. Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.

[20] Thomas Hofmann,et al. Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[21] Ii Gerald Francis Dejong. Skimming stories in real time: an experiment in integrated understanding. , 1979 .

[22] Roger C. Schank,et al. SCRIPTS, PLANS, GOALS, AND UNDERSTANDING , 1988 .

[23] Eduard Hovy,et al. Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[24] Mark T. Maybury,et al. Advances in Automatic Text Summarization , 1999 .

[25] Jose Abracos,et al. Statistical methods for retrieving most significant paragraphs in newspaper articles , 1997, Workshop On Intelligent Scalable Text Summarization.

[26] Chin-Yew Lin. Training a selection function for extraction , 1999, CIKM '99.

[27] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[28] Xin Liu,et al. Document clustering with cluster refinement and model selection capabilities , 2002, SIGIR '02.

[29] Donna Harman,et al. Multi-task multi-modality SVM for early COVID-19 Diagnosis using chest CT data , 2021, Information Processing & Management.

[30] Robert J. Gaizauskas,et al. Using Coreference Chains for Text Summarization , 1999, COREF@ACL.

[31] Sheryl R. Young,et al. Automatic Classification and Summarization of Banking Telexes , 1985, CAIA.

[32] Simone Teufel,et al. Sentence extraction as a classification task , 1997 .

[33] Kathleen F. McCoy,et al. Efficient text summarization using lexical chains , 2000, IUI '00.

[34] Eduard H. Hovy,et al. Identifying Topics by Position , 1997, ANLP.