Hardware for searching very large text databases

This paper discusses the problem of searching very large text databases. It is shown that conventional techniques for searching current databases cannot be scaled up to larger ones, and that it is necessary to build hardware to search the database in parallel if reasonable search times are expected. The part of the search process requiring the highest bandwidth is scanning the database to detect instances of search terms. Methods of doing this in hardware that have been mentioned in the literature are examined, and design criteria for term matchers are discussed. A new design that uses a nondeterministic finite state automaton to control matching, is introduced, its operation is explained, and the practicality of using it in a real system is discussed.

[1]  David K. Hsiao,et al.  Structure memory designs for a database computer , 1977, ACM '77.

[2]  J. B. Newsbaum,et al.  Text file inversion: an evaluation , 1978, CARN.

[3]  R. M. Bird,et al.  Associative/parallel processors for searching very large textual data bases , 1977, CAW '77.

[4]  G. Jack Lipovski,et al.  CASSM: a cellular system for very large data bases , 1975, VLDB '75.

[5]  William Howard Stellhorn A specialized computer for information retrieval. , 1974 .

[6]  James A. Sprowl,et al.  Computer‐Assisted Legal Research—An Analysis of Full‐Text Document Retrieval Systems, Particularly the LEXIS System , 1976 .

[7]  James Michael Milner An analysis of rotational storage access scheduling in a multiprogrammed information retrieval system. , 1976 .

[8]  Lee A. Hollaar,et al.  Text Retrieval Computers , 1979, Computer.

[9]  Kenneth C. Smith,et al.  RAP: an associative processor for data base management , 1975, AFIPS '75.

[10]  H. T. Kung,et al.  The Design of Special-Purpose VLSI Chips , 1980, Computer.

[11]  George P. Copeland,et al.  String storage and searching for data base applications: Implementation on the INDY backend kernel , 1978 .

[12]  D. C. Cooper,et al.  Sequential Machines and Automata Theory , 1968, Comput. J..

[13]  Chyuan Shiun Lin,et al.  The design of a rotating associative memory for relational database applications , 1976, TODS.

[14]  Lee A. Hollaar An architecture for the efficient combining of linearly ordered lists , 1976, SIGF.

[15]  Amar Mukhopadhyay Hardware algorithms for nonnumeric computation , 1978, ISCA '78.