As one approach to automatic indexing, the CLARIT System utilizes selective natural-language processing (NLP) to identify candidate noun phrases in free text and maps them into candidate terms, in a morphologically-normalized form, emphasizing modifier and head relations. Candidate terms are matched against a first-order thesaurus of certified domain-specific terminology. Terms are scored and ranked based on the distribution statistics of the term (and its lexical items) in a document. Terms are weighted, as well, according to their distribution both in a reference domain database and a large, general corpus of English. The result is a tripartite indexing of a document by terms classified as exact (or certified), general, and novel, each ranked for relevance. In an evaluation comparing CLARIT automatic indexing of ten full-text articles in the domain of artificial intelligence to the indexing of two human subjects, it was found that CLARIT performed as well---and in some respects better---than the humans.
[1]
Peretz Shoval.
Expert/consultation system for a retrieval data-base with semantic network of concepts
,
1981,
SIGIR 1981.
[2]
Michael McGill,et al.
Introduction to Modern Information Retrieval
,
1983
.
[3]
M. E. Maron,et al.
An evaluation of retrieval effectiveness for a full-text document-retrieval system
,
1985,
CACM.
[4]
W. Bruce Croft,et al.
Language‐oriented information retrieval
,
1989,
Int. J. Intell. Syst..