Create Special Domain News Collections through Summarization and Classification

In this paper, we present a novel technique to create a special domain news collection system from really simple syndication (RSS) news sites through summarization and classification. The main aim of this research is to build a self-sufficient news collection system in disaster domain. In this news collection system, we used new strategies and algorithms to mine news from RSS sites, recognized and collected disaster news using automatic summarization and classification. The most striking dissimilarity between our study and previous work is that we use a novel summary approach to improve the classification performance. This paper discusses the effect of summarization and classification model on system performance. Results show that our method yields a better performance in this field. Copyright © 2010 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

[1]  Ani Nenkova,et al.  The Impact of Frequency on Summarization , 2005 .

[2]  D. K. Harmon,et al.  Overview of the Third Text Retrieval Conference (TREC-3) , 1996 .

[3]  Gerard Salton,et al.  Improving Retrieval Performance by Relevance Feedback , 1997 .

[4]  Li Sheng,et al.  Sentences Optimum Selection for Multi-Document Summarization , 2006 .

[5]  Shingo Kuroiwa,et al.  Mining News Sites to Create Special Domain News Collections , 2008 .

[6]  Silvia Bernardini,et al.  Facilitating the compilation and dissemination of ad-hoc web corpora , 2004 .

[7]  Fuji Ren Automatic Abstracting Important Sentences , 2005, Int. J. Inf. Technol. Decis. Mak..

[8]  Hyoil Han,et al.  The use of domain-specific concepts in biomedical text summarization , 2007, Inf. Process. Manag..

[9]  Donna K. Harman,et al.  Overview of the Fourth Text REtrieval Conference (TREC-4) , 1995, TREC.

[10]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[11]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[12]  F. Ren,et al.  READING: A Self Sufficient Internet News System with Applications in Information and Knowledge Mining , 2007, 2007 International Conference on Natural Language Processing and Knowledge Engineering.

[13]  Silvia Bernardini,et al.  BootCaT: Bootstrapping Corpora and Terms from the Web , 2004, LREC.

[14]  Timothy Chklovski,et al.  Learner: a system for acquiring commonsense knowledge by analogy , 2003, K-CAP '03.

[15]  Yolanda Gil,et al.  Towards Managing Knowledge Collection from Volunteer Contributors , 2005, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.