Automatic indexing

In information retrieval and text processing systems the search requests and stored information items are normally represented by sets of content identifiers, known as keywords or index terms. The choice of effective indexing products designed accurately to reflect document content is by far the most crucial task in retrieval. A wide variety of automatic indexing theories and systems have been developed over the years based in part on the occurrence frequencies of individual terms in the documents of a collection, on the ability of specific terms to distinguish the documents from each other, and on the distribution of terms in the relevant as opposed to the nonrelevant items in a collection. These theories lead to the automatic generation of content identifiers consisting of individual terms, term phrases, and classes of terms, and to the assignment of term weights reflecting the relative importance of the terms for content representation. The various indexing theories are covered and analytical as well as experimental results are given to demonstrate their effectiveness.