Computation of term/document discrimination values by use of the cover coefficient

Indexing in information retrieval (IR) is used to obtain a suitable vocabulary of index terms and optimum assignment of these terms to documents for increasing the effectiveness and efficiency of an IR system. The concept of term discrimination value (TDV) is one of the criteria used for index‐term selection. In this article a new concept called the cover coefficient (CC) will be used in computing TDVs. After a brief introduction to the theory of indexing and the CC concept, an efficient way of computing TDVs by use of the CC concept, index‐term selection, and weight modification are discussed. It is also shown that the computational cost of the CC approach in the calculation of TDVs is favorably comparable to the cost of a different approach that uses similarity coefficients. Furthermore, the TDVs obtained by the CC approach are consistent with those of the latter approach. © 1987 John Wiley & Sons, Inc.

[1]  Harold Borko,et al.  Toward a theory of indexing , 1977, Inf. Process. Manag..

[2]  Esen A. Ozkarahan,et al.  Two partitioning type clustering algorithms , 1984, J. Am. Soc. Inf. Sci..

[3]  Fazli Can,et al.  A clustering scheme , 1983, SIGIR 1983.

[4]  Gerard Salton,et al.  Generation and search of clustered files , 1978, TODS.

[5]  B. Everitt Unresolved Problems in Cluster Analysis , 1979 .

[6]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[7]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[8]  Clement T. Yu,et al.  Precision Weighting—An Effective Automatic Indexing Method , 1976, J. ACM.

[9]  Fazli Can,et al.  Concepts of the cover coefficient-based clustering methodology , 1985, SIGIR '85.

[10]  Gerard Salton,et al.  A theory of indexing , 1975, Regional conference series in applied mathematics.

[11]  W. Bruce Croft Document representation in probabilistic models of information retrieval , 1981, J. Am. Soc. Inf. Sci..

[12]  Clement T. Yu,et al.  Automatic indexing using term discrimination and term precision measurements , 1976, Information Processing & Management.

[13]  Robert G. Crawford,et al.  The computation of discrimination values , 1975, Inf. Process. Manag..

[14]  Fazli Can,et al.  Similarity and stability analysis of the two partitioning type clustering algorithms , 1985, J. Am. Soc. Inf. Sci..

[15]  Harold Borko Automatic indexing: a tutorial , 1982, SIGF.

[16]  M. E. Maron Depth of indexing , 1979, J. Am. Soc. Inf. Sci..

[17]  William S. Cooper,et al.  Foundations of Probabilistic and Utility-Theoretic Indexing , 1978, JACM.