Exploiting the maximum entropy principle to increase retrieval effectiveness

Several of the drawbacks of conventional information retrieval systems can be overcome by a design approach in which queries consist of sets of terms, either unweighted or weighted with subjective term precision estimates, and retrieval outputs are ranked by probability of usefulness estimated in accordance with the so‐called “maximum entropy principle.” A system organized along these lines combines the convenience of a simple input language with a powerful probabilistic inference mechanism capable of exploiting kinds of statistical clues not ordinarily used in systems of traditional design. The sensitivity of the maximum entropy principle to the frequencies and joint frequencies with which terms have been assigned to documents in the collection results in a system design of increased power and expressiveness without a concomitant increase in the complexity of the request language. It incorporates the more important search capabilities of both Boolean and conventional weighted‐request languages and facilitates the use of unconventional search clues.

[1]  William S. Cooper,et al.  Foundations of Probabilistic and Utility-Theoretic Indexing , 1978, JACM.

[2]  Don R. Swanson,et al.  A decision theoretic foundation for indexing , 1975, J. Am. Soc. Inf. Sci..

[3]  Donald H. Kraft,et al.  Operations Research Applied to Document Indexing and Retrieval Decisions , 1977, JACM.

[4]  C. J. van Rijsbergen,et al.  An Evaluation of feedback in Document Retrieval using Co‐Occurrence Data , 1978, J. Documentation.

[5]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[6]  S. Kullback,et al.  The Information in Contingency Tables , 1980 .

[7]  I. Good Maximum Entropy for Hypothesis Formulation, Especially for Multidimensional Contingency Tables , 1963 .

[8]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical Literature , 1975, J. Am. Soc. Inf. Sci..

[9]  Clement T. Yu,et al.  Automatic indexing using term discrimination and term precision measurements , 1976, Information Processing & Management.

[10]  Don R. Swanson,et al.  Probabilistic models for automatic indexing , 1974, J. Am. Soc. Inf. Sci..

[11]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[12]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[13]  Philip M. Lewis,et al.  Approximating Probability Distributions to Reduce Storage Requirements , 1959, Information and Control.