The aim of this article is to propose a means of determining the similarity between search request formulations, which are Boolean combinations of descriptors. The similarity measures introduced can be used to cluster Boolean search request formulations. The search request formulation clusters can in turn be utilized in methods for clustering document representations. The initiation of the work on the methodology for determining similarity measures for Boolean search request formulations has been encouraged by hitherto performed experimental research into information retrieval systems with search request formulations in the form of sets of certain descriptors, i.e., in the same form as document representations. These experiments have shown a preponderance of those methods for clustering document representations that use previously formed clusters of search request formulations. We can expect that by using the proposed methodology for determining the similarity between Boolean combinations of descriptors, we might obtain similar results for the case of Boolean search request formulations. The introductory part of the article provides the background for undertaking the research reported. The underlying notions are then presented, and the proposed measures of similarity between Boolean search request formulations are justified and illustrated by examples. In addition, some experimental results concerning the suggested similarity measures are given.
[1]
Gerard Salton,et al.
Automatic Information Organization And Retrieval
,
1968
.
[2]
Tadeusz Radecki.
Mathematical model of time-effective information retrieval system based on the theory of fuzzy sets
,
1977,
Inf. Process. Manag..
[3]
Eugene Wong,et al.
Canonical structure in attribute based file organization
,
1971,
CACM.
[4]
Clement T. Yu.
A clustering algorithm based on user queries
,
1974,
J. Am. Soc. Inf. Sci..
[5]
Michael E. Lesk,et al.
Computer Evaluation of Indexing and Text Processing
,
1968,
JACM.
[6]
S. Siegel,et al.
Nonparametric Statistics for the Behavioral Sciences
,
2022,
The SAGE Encyclopedia of Research Design.
[7]
Miroslaw Dabrowski.
A General Model of Distribution of Objects in Information Retrieval Systems
,
1975,
Inf. Syst..