论文信息 - Automatic indexing using selective NLP and first-order thesauri

Automatic indexing using selective NLP and first-order thesauri

As one approach to automatic indexing, the CLARIT System utilizes selective natural-language processing (NLP) to identify candidate noun phrases in free text and maps them into candidate terms, in a morphologically-normalized form, emphasizing modifier and head relations. Candidate terms are matched against a first-order thesaurus of certified domain-specific terminology. Terms are scored and ranked based on the distribution statistics of the term (and its lexical items) in a document. Terms are weighted, as well, according to their distribution both in a reference domain database and a large, general corpus of English. The result is a tripartite indexing of a document by terms classified as exact (or certified), general, and novel, each ranked for relevance. In an evaluation comparing CLARIT automatic indexing of ten full-text articles in the domain of artificial intelligence to the indexing of two human subjects, it was found that CLARIT performed as well---and in some respects better---than the humans.

[1] Peretz Shoval. Expert/consultation system for a retrieval data-base with semantic network of concepts , 1981, SIGIR 1981.

[2] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[3] M. E. Maron,et al. An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[4] W. Bruce Croft,et al. Language‐oriented information retrieval , 1989, Int. J. Intell. Syst..