论文信息 - Automatic Keyword Extraction Using Linguistic Features

Automatic Keyword Extraction Using Linguistic Features

This paper describes a novel keyword extraction algorithm position weight (PW) that utilizes linguistic features to represent the importance of the word position in a document. Topical terms and their previous-term and next-term co-occurrence collections are extracted. To measure the degree of correlation between a topical term and its co-occurrence terms, three methods are employed including term frequency inverse term frequency (TFITF), position weight inverse position weight (PWIPW), and CHI-square (chi2). The co-occurrence terms that have the highest degree of correlation and exceed a co-occurrence frequency threshold are combined together with the original topical term to form a final keyword. With the linear computational complexity of the algorithm, the vector space of documents in a large corpus or boundless Web can be quickly represented by sets of keywords, which makes it possible to retrieve large-scale information fast and effectively

Bin Wu | Xinghua Hu | Xinghua Hu | Bin Wu

[1] Ellen Riloff,et al. Little words can make a big difference for text classification , 1995, SIGIR '95.

[2] Robert Krovetz,et al. Viewing morphology as an inference process , 1993, Artif. Intell..

[3] Shiwen Yu,et al. News-Oriented Automatic Chinese Keyword Indexing , 2003, SIGHAN.

[4] Yiming Yang,et al. A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[5] Julio Gonzalo,et al. Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[6] Christopher J. Fox,et al. Lexical Analysis and Stoplists , 1992, Information Retrieval: Data Structures & Algorithms.

[7] A. M. Turing,et al. Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[8] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[9] Bruce Krulwich,et al. Learning user information interests through extraction of semantically significant phrases , 1996 .

[10] George A. Miller,et al. Introduction to WordNet: An On-line Lexical Database , 1990 .

[11] Peter D. Turney. Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[12] Sotiris Kotsiantis,et al. Text Classification Using Machine Learning Techniques , 2005 .

[13] A. M. Turing,et al. Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.