An Automatic Online News Topic Keyphrase Extraction System

News Topics are related to a set of keywords or keyphrases. Topic keyphrases briefly describe the key content of topics and help users decide whether to do further reading about them. Moreover, keyphrases of a news topic can be considered as a cluster of related terms, which provides term relationship information that can be integrated into information retrieval models. In this paper, an automatic online news topic keyphrase extraction system is proposed. News stories are organized into topics. Keyword candidates are firstly extracted from single news stories and filtered with topic information. Then a phrase identification process combines keywords into phrases using position information. Finally, the phrases are ranked and top ones are selected as topic keyphrases. Experiments performed on practical Web datasets show that the proposed system works effectively, with a performance of precision=70.61% and recall=67.94%.

[1]  Jian-Yun Nie,et al.  Integrating word relationships into language models , 2005, SIGIR '05.

[2]  Jian-Yun Nie,et al.  Using query contexts in information retrieval , 2007, SIGIR.

[3]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[4]  James Allan,et al.  UMass at TDT 2004 , 2004 .

[5]  G. Karypis,et al.  Criterion functions for document clustering , 2005 .

[6]  Dolf Trieschnigg,et al.  Hierarchical topic detection in large digital news archives: Exploring a sample based approach , 2005, J. Digit. Inf. Manag..

[7]  Jian-Yun Nie,et al.  Context-Dependent Term Relations for Information Retrieval , 2006, EMNLP.

[8]  Min Zhang,et al.  Automatic online news topic ranking using media focus and user attention based on aging theory , 2008, CIKM '08.

[9]  Min Zhang,et al.  Automatic online news issue construction in web environment , 2008, WWW.

[10]  Wessel Kraaij,et al.  TNO at TDT2001: Language Model-Based Topic Detection , 2001 .

[11]  Wenfeng Yang Chinese keyword extraction based on max-duplicated strings of the documents , 2002, SIGIR '02.

[12]  Kuo Zhang,et al.  Keyword extraction based on tf/idf for Chinese news document , 2007, Wuhan University Journal of Natural Sciences.

[13]  Zhi Zhou,et al.  Keyphrase Extraction Using Semantic Networks Structure Analysis , 2006, Sixth International Conference on Data Mining (ICDM'06).

[14]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[15]  Joe Carthy,et al.  Combining semantic and syntactic document classifiers to improve first story detection , 2001, SIGIR '01.

[16]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[17]  Juan-Zi Li,et al.  Keyword Extraction Using Support Vector Machine , 2006, WAIM.

[18]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[19]  Shiwen Yu,et al.  Automatic Keyphrase Extraction from Chinese News Documents , 2005, FSKD.

[20]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[21]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[22]  Lee-Feng Chien,et al.  PAT-tree-based keyword extraction for Chinese information retrieval , 1997, SIGIR '97.

[23]  Chien Chin Chen,et al.  Life Cycle Modeling of News Events Using Aging Theory , 2003, ECML.

[24]  Qi He,et al.  Using Burstiness to Improve Clustering of Topics in News Streams , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[25]  Kuo Zhang,et al.  New event detection based on indexing-tree and named entity , 2007, SIGIR.

[26]  Philip S. Yu,et al.  Parameter Free Bursty Events Detection in Text Streams , 2005, VLDB.

[27]  Ee-Peng Lim,et al.  Analyzing feature trajectories for event detection , 2007, SIGIR.

[28]  James Allan,et al.  Automatic generation of overview timelines , 2000, SIGIR '00.

[29]  Hector Garcia-Molina,et al.  Overview of multidatabase transaction management , 2005, The VLDB Journal.

[30]  James Allan,et al.  Text classification and named entities for new event detection , 2004, SIGIR '04.

[31]  Philip S. Yu,et al.  Time-dependent event hierarchy construction , 2007, KDD '07.