Extended Boolean information retrieval

In conventional information retrieval Boolean combinations of index terms are used to formulate the users'' information requests. While any document is in principle retrievable by a Boolean query, the amount of output obtainable by Boolean processing is difficult to control, and the retrieved items are not ranked in any presumed order of importance to the user population. In the vector processing model of retrieval, the retrieved items are easily ranked in decreasing order of the query-record similarity, but the queries themselves are unstructured and expressed as simple sets of weighted index terms. A new, extended Boolean information retrieval system is introduced which is intermediate between the Boolean system of query processing and the vector processing model. The query structure inherent in the Boolean system is preserved, while at the same time weighted terms may be incorporated into both queries and stored documents; the retrieved output can also be ranked in strict similarity order with the user queries. A conventional retrieval system can be modified to make use of the extended system. Laboratory tests indicate that the extended system produces better retrieval output than either the Boolean or the vector processing systems.

[1]  G. Salton,et al.  Automatic query formulations in information retrieval , 1982, J. Am. Soc. Inf. Sci..

[2]  Edward A. Fox,et al.  Boolean Query Formulation with Relevance Feedback , 1983 .

[3]  Jeffrey Katzer,et al.  A study of the overlap among document representations , 1983, SIGIR '83.

[4]  Abraham Bookstein,et al.  A comparison of two systems of weighted boolean retrieval , 1981, J. Am. Soc. Inf. Sci..

[5]  Clement T. Yu,et al.  The measurement of term importance in automatic indexing , 1981, J. Am. Soc. Inf. Sci..

[6]  J. Douglas Faires,et al.  Numerical Analysis , 1981 .

[7]  Harry Wu On query formulation in information retrieval , 1981 .

[8]  Donald H. Kraft,et al.  Threshold values and Boolean retrieval systems , 1981, Inf. Process. Manag..

[9]  Abraham Bookstein,et al.  Fuzzy requests: An approach to weighted boolean searches , 1980, J. Am. Soc. Inf. Sci..

[10]  Martin Dillon,et al.  The Use of Automatic Relevance feedback in Boolean Retrieval Systems , 1980, J. Documentation.

[11]  Karen Spärck Jones Experiments in relevance weighting of search terms , 1979, Inf. Process. Manag..

[12]  Donald H. Kraft,et al.  A mathematical model of a weighted boolean retrieval system , 1979, Inf. Process. Manag..

[13]  Terry Noreault,et al.  Automatic ranked output from boolean searches in SIRE , 1977, J. Am. Soc. Inf. Sci..

[14]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[15]  Clement T. Yu,et al.  Precision Weighting—An Effective Automatic Indexing Method , 1976, J. ACM.

[16]  Clement T. Yu,et al.  Automatic indexing using term discrimination and term precision measurements , 1976, Information Processing & Management.

[17]  Gerard Salton,et al.  On the role of words and phrases in automatic text analysis , 1975, CL.

[18]  Gerard Salton,et al.  A theory of indexing , 1975, Regional conference series in applied mathematics.

[19]  F. W. Lancaster,et al.  Information retrieval: on-line , 1973 .

[20]  Gerard Salton,et al.  A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART) , 1972, J. Am. Soc. Inf. Sci..

[21]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[22]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[23]  Gerard Salton,et al.  A Comparison Between Manual and Automatic Indexing Methods , 1968 .

[24]  F. W. Lancaster,et al.  Information retrieval systems; characteristics, testing, and evaluation , 1968 .

[25]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[26]  Robert Gardner,et al.  The Elements Of Integration , 1968 .

[27]  J. Miller Numerical Analysis , 1966, Nature.