A comparison of a network structure and a database system used for document retrieval

Abstract Database systems have many advantages for implementing document retrieval systems. One of the main advantages would be the integration of data and text handling in a single information system. However, it has not been clear how much a database implementation would cost in terms of efficiency. In this paper, we compare a database implementation and a stand-alone implementation of a flexible representation of the content of documents and the associated search strategies. The representation used is a network of document and index term nodes. The comparison shows that certain features of a database system can have a significant effect on the efficiency of the implementation. Despite this, it appears that a database implementation of a sophisticated document retrieval system can be competitive with a stand-alone implementation.

[1]  W. Bruce Croft A model of cluster searching bases on classification , 1980, Inf. Syst..

[2]  C. J. van Rijsbergen,et al.  The nearest neighbour problem in information retrieval: an algorithm using upperbounds , 1981, SIGIR '81.

[3]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[4]  Hans-Jörg Schek,et al.  Data Structures for an Integrated Data Base Management and Information Retrieval System , 1982, VLDB.

[5]  C. J. van Rijsbergen,et al.  An Evaluation of feedback in Document Retrieval using Co‐Occurrence Data , 1978, J. Documentation.

[6]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[7]  Robert N. Oddy,et al.  INFORMATION RETRIEVAL THROUGH MAN‐MACHINE DIALOGUE , 1977 .

[8]  Michael Stonebraker,et al.  Document processing in a relational database system , 1983, TOIS.

[9]  Ellis Horowitz,et al.  Fundamentals of Data Structures , 1984 .

[10]  Ian A. Macleod A data base management system for document retrieval applications , 1981, Inf. Syst..

[11]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[12]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[13]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[14]  G. Salton,et al.  A Generalized Term Dependence Model in Information Retrieval , 1983 .

[15]  W. Bruce Croft,et al.  The Use of Adaptive Mechanisms for Selection of Search Strategies in Document Retrieval Systems , 1984, SIGIR.

[16]  James C. Baughman Federal aid and the growth of a subject literature , 1972, J. Am. Soc. Inf. Sci..