Keyphrase Extraction Using Semantic Networks Structure Analysis

Keyphrases play a key role in text indexing, summarization and categorization. However, most of the existing keyphrase extraction approaches require human-labeled training sets. In this paper, we propose an automatic keyphrase extraction algorithm, which can be used in both supervised and unsupervised tasks. This algorithm treats each document as a semantic network. Structural dynamics of the network are used to extract keyphrases (key nodes) unsupervised. Experiments demonstrate the proposed algorithm averagely improves 50% in effectiveness and 30% in efficiency in unsupervised tasks and performs comparatively with supervised extractors. Moreover, by applying this algorithm to supervised tasks, we develop a classifier with an overall accuracy up to 80%.

[1]  S. Strogatz Exploring complex networks , 2001, Nature.

[2]  Mark Newman,et al.  The structure and function of networks , 2002 .

[3]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[4]  Stan Matwin,et al.  A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization , 2001 .

[5]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[6]  J. B. Keith Humphreys PhraseRate : An HTML Keyphrase Extractor ∗ , 2002 .

[7]  Jihoon Yang,et al.  Knowledge-based metadata extraction from PostScript files , 2000, DL '00.

[8]  Dunja Mladenic,et al.  Word sequences as features in text-learning , 1998 .

[9]  S. M.G. Caldeira,et al.  The network of concepts in written texts , 2006 .

[10]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[11]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[12]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[13]  S. N. Dorogovtsev,et al.  Exactly solvable small-world network , 1999, cond-mat/9907445.

[14]  Mariano Sigman,et al.  Global organization of the Wordnet lexicon , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[16]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[17]  Cai Qingsheng,et al.  Automatic keywords extraction of Chinese document using small world structure , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[18]  Massimo Marchiori,et al.  Method to find community structures based on information centrality. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Mark Steyvers,et al.  The Large-Scale Structure of Semantic Networks , 2005 .

[20]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[21]  V Latora,et al.  Efficient behavior of small-world networks. , 2001, Physical review letters.

[22]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[23]  Amita Goyal Chin Text Databases and Document Management: Theory and Practice , 2000 .

[24]  Chrystopher L. Nehaniv,et al.  Entropy Indicators for Investigating Early Language Processes , 2005 .