The problem of finding nearest neighbors to a query in a document collection is a special case of associative retrieval, in which searches are performed using more than one key. A nearest neighbors associative retrieval algorithm, suitable for document retrieval using similarity matching, is described. The basic structure used is a binary tree, at each node a set of keys (concepts) is tested to select the most promising branch. Backtracking to initially rejected branches is allowed and often necessary.
Under certain conditions, the search time required by this algorithm is 0(log2N)k. N is the number of documents, and k is a system-dependent parameter. A series of experiments with a small collection confirm the predictions made using the analytic model; k is approximately 4 in this situation.
This algorithm is compared with two other searching algorithms; sequential search and clustered search. For large collections, the average search time for this algorithm is less than that for a sequential search and greater than that for a clustered search. However, the clustered search, unlike the sequential search and this algorithm, does not guarantee that the near neighbors found are actually the nearest neighbors.
[1]
Ronald L. Rivest,et al.
Analysis of associative retrieval algorithms
,
1974
.
[2]
Walter A. Burkhard,et al.
Some approaches to best-match file searching
,
1973,
Commun. ACM.
[3]
Gerard Salton,et al.
Dynamic information and library processing
,
1975
.
[4]
Gerard Salton,et al.
Automatic Information Organization And Retrieval
,
1968
.
[5]
Gerard Salton,et al.
The SMART Retrieval System—Experiments in Automatic Document Processing
,
1971
.
[6]
Caroline M. Eastman,et al.
A tree algorithm for nearest neighbor searching in document retrieval systems
,
1978,
SIGIR 1978.
[7]
Keinosuke Fukunaga,et al.
A Branch and Bound Algorithm for Computing k-Nearest Neighbors
,
1975,
IEEE Transactions on Computers.
[8]
Jon Louis Bentley,et al.
Quad trees a data structure for retrieval on composite keys
,
1974,
Acta Informatica.
[9]
E. M. Keen,et al.
X- X. an Analysis of the Documentation Requests
,
1967
.
[10]
Jon Louis Bentley,et al.
An Algorithm for Finding Best Matches in Logarithmic Expected Time
,
1977,
TOMS.