Thesaurus Based Term Ranking for Keyword Extraction

In many cases keywords from a restricted set of possible keywords have to be assigned to texts. A common way to find the best keywords is to rank terms occurring in the text according to their tf.idf value. This requires a corpus of texts from which document frequencies can be derived. In this paper we show that we can obtain results of the same quality without the usage of a background corpus, using relations between terms provided in a thesaurus.

[1]  Karl Cox,et al.  Identifying Domain Context for the Intentional Modelling Technique MAP , 2007 .

[2]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[3]  Unisist Guidelines for the establishment and development of monolingual thesauri : UNISIST , 1973 .

[4]  Cong Wang,et al.  Keyword Extraction Based on PageRank , 2007, PAKDD.

[5]  Véronique Malaisé,et al.  A Method to Convert Thesauri to SKOS , 2006, ESWC.

[6]  Anette Hulth,et al.  Automatic Keyword Extraction Using Domain Knowledge , 2001, CICLing.

[7]  Djoerd Hiemstra,et al.  A probabilistic justification for using tf×idf term weighting in information retrieval , 2000, International Journal on Digital Libraries.

[8]  Luis M. de Campos,et al.  Automatic Indexing from a Thesaurus Using Bayesian Networks: Application to the Classification of Parliamentary Initiatives , 2007, ECSQARU.

[9]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[10]  Kurt Leininger,et al.  Interindexer consistency in PsycINFO , 2000, J. Libr. Inf. Sci..

[11]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[12]  Ian H. Witten,et al.  Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[13]  Jaap Kamps,et al.  Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary , 2004, ECIR.

[14]  Christian Wartena,et al.  Apolda: A Practical Tool for Semantic Annotation , 2007, 18th International Workshop on Database and Expert Systems Applications (DEXA 2007).

[15]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[16]  Ian H. Witten,et al.  Thesaurus-based index term extraction for agricultural documents , 2005 .

[17]  Warren R. Greiff,et al.  A theory of term weighting based on exploratory data analysis , 1998, SIGIR '98.

[18]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .