Specialized Parallel Architectures for Textual Databases

Publisher Summary The chapter presents the concept of unformatted databases and parallel architectures, proposed to manipulate textual databases. Databases fall into two general categories—namely, formatted and unformatted structures. Formatted databases are mainly time variant entities and are subject to extensive alteration as well as search operations. Unformatted databases (bibliographic or full-text) are archival in nature and are processed by searching for a pattern or a combination of patterns. The problem of searching large textual databases is addressed in the chapter. To improve the performance of such a lengthy operation, two major directions are discussed: one based on the design of efficient algorithms for pattern matching operations, and the other based on the hardware implementation of the basic pattern matching operations. Both approaches have their own merits and are subject to further research and study. However, the major theme of this chapter is centered around the design of the hardware pattern matcher. Such an emphasis is mainly due to the current advances in technology that have enabled the migration of the software functions into the hardware. Three different schemes of hardware implementation of an efficient term comparator for specialized backend text retrieval architectures are also discussed.

[1]  Kenneth J. Thurber,et al.  Associative and Parallel Processors , 1975, CSUR.

[2]  Donald H. Kraft,et al.  Advances in Information Retrieval: Where Is That /#*&@¢ Record? , 1985, Adv. Comput..

[3]  Amar Mukhopadhyay,et al.  Hardware Algorithms for Nonnumeric Computation , 1978, IEEE Transactions on Computers.

[4]  Jeffrey D. Ullman,et al.  Formal languages and their relation to automata , 1969, Addison-Wesley series in computer science and information processing.

[5]  Lee A. Hollaar The Utah Text Search Engine: Implementation Experiences and Future Plans , 1985, IWDM.

[6]  Forbes J. Burkowski A Hardware Hashing Scheme in the Design of a Multiterm String Comparator , 1982, IEEE Transactions on Computers.

[7]  B. A. Crane,et al.  A cryoelectronic distributed logic memory , 1967, AFIPS '67 (Spring).

[8]  Gerard Salton,et al.  Some characteristics of future information systems , 1985, SIGF.

[9]  Bjarne Stroustrup,et al.  The C++ programming language (2nd ed.) , 1991 .

[10]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[11]  Gian Piero Zarri,et al.  A fifth generation approach to intelligent information retrieval , 1984, ACM '84.

[12]  Ali R. Hurson,et al.  Parallel Architectures for Database Systems , 1989, Adv. Comput..

[13]  David K. Hsiao Data Base Computers , 1980, Adv. Comput..

[14]  James W. Hooper,et al.  An architecture for textual information retrieval , 1988, CSC '88.

[15]  William H. Stellhorn,et al.  An Inverted File Processor for Information Retrieval , 1977, IEEE Transactions on Computers.

[16]  Lee A. Hollaar,et al.  Current Research Into Specialized Processors For Text Information Retrieval , 1978, VLDB.

[17]  Gerard Salton,et al.  Expert systems and information retrieval , 1987, SIGF.

[18]  David K. Hsiao,et al.  Advanced Database Machine Architecture , 1983, Advanced Database Machine Architecture.

[19]  Safwat G. Zaky Microprocessors for non-numeric processing , 1977, CAW '77.

[20]  Patricia J. Klingensmith,et al.  Easy access to DIALOG, ORBIT, and BRS , 1984 .

[21]  Horst Biller,et al.  On the Architecture of a System Integrating Data Base Management and Information Retrieval , 1982, SIGIR.

[22]  Eugene Garfield,et al.  Citation indexing - its theory and application in science, technology, and humanities , 1979 .

[23]  R. M. Bird,et al.  Associative/parallel processors for searching very large textual data bases , 1977, CAW '77.

[24]  Robert Marcus Some observations on retrieval from a large technical document database , 1986, SIGF.

[25]  Sakti Pramanik Highly parallel associative search and its application to cellular database machine design , 1981, AFIPS '81.

[26]  Craig Stanfill,et al.  Parallel free-text search on the connection machine system , 1986, CACM.

[27]  D. A. Morris Processor matches text at high speeds , 1983 .

[28]  Jane W.-S. Liu,et al.  The design of system architectures for information retrieval , 1976, ACM '76.

[29]  Lee A. Hollaar,et al.  Text Retrieval Computers , 1979, Computer.

[30]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[31]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[32]  Amar Mukhopadhyay Hardware Algorithms forNonnumeric Computation , 1979 .

[33]  R. H. Bullen,et al.  Microtext: the design of a microprogrammed finite state search machine for full-text retrieval , 1972, AFIPS '72 (Fall, part I).

[34]  Gerard Salton,et al.  Another look at automatic text-retrieval systems , 1986, CACM.

[35]  Lee A. Hollaar Specialized merge processor networks for combining sorted lists , 1978, TODS.

[36]  David L. Waltz,et al.  Applications of the Connection Machine , 1990, Computer.

[37]  Edward A. Fox,et al.  Automatic query formulations in information retrieval , 1983, J. Am. Soc. Inf. Sci..

[38]  George P. Copeland,et al.  String storage and searching for data base applications: Implementation on the INDY backend kernel , 1978 .

[39]  Roger L. Haskin,et al.  Architecture and Operation of a Large, Full-Text Information-Retrieval System , 1983, Advanced Database Machine Architecture.

[40]  R. S. Gaines,et al.  An Improved Cell Memory , 1965, IEEE Trans. Electron. Comput..

[41]  Tadeusz Radecki,et al.  Incorporation of Relevance Feedback into Boolean Retrieval System , 1982, SIGIR.

[42]  Ali R. Hurson,et al.  The design of a hardware recognizer for utilization in scanning operations , 1985, CSC '85.

[43]  Donald H. Kraft,et al.  Fuzzy Sets and Generalized Boolean Retrieval Systems , 1983, Int. J. Man Mach. Stud..

[44]  S. K. Michael Wong,et al.  A unified approach for artificial intelligence and information retrieval , 1986, SIGF.

[45]  C. Y. Lee,et al.  A content addressable distributed logic memory with applications to information retrieval , 1963 .

[46]  K. Design of Special-Purpose VLSI Chips : Example and Opinions , .

[47]  Roger L. Haskin,et al.  Operational characteristics of a harware-based pattern matcher , 1983, TODS.

[48]  Claude Kaiser,et al.  Distributed computing systems , 1986 .

[49]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[50]  Terence W. Pratt Programmming languages: design and implementation (2nd ed.) , 1983 .

[51]  Sudhir Ahuja,et al.  An associative/parallel processor for partial match retrieval using superimposed codes , 1980, ISCA '80.

[52]  Lee A. Hollaar,et al.  A specialized architecture for textual information retrieval , 1977, AFIPS '77.

[53]  Christos Faloutsos,et al.  Access methods for text , 1985, CSUR.

[54]  J. B. Newsbaum,et al.  Text file inversion: an evaluation , 1978, CARN.

[55]  Tamas E. Doszkocs,et al.  AID, an Associative Interactive Dictionary for online searching , 1978 .

[56]  Richard S. Marcus,et al.  A translating computer interface for end-user operation of heterogeneous retrieval systems. II. Evaluations , 1981, J. Am. Soc. Inf. Sci..

[57]  Edward A. Fox,et al.  An Automatic Environment for Boolean Information Retrival , 1983, IFIP Congress.

[58]  Lee A. Hollaar,et al.  Rotating memory processors for the matching of complex textual patterns , 1978, ISCA '78.

[59]  M. E. Maron,et al.  On indexing, retrieval and the meaning of about , 1977, J. Am. Soc. Inf. Sci..

[60]  Michael P. Zeleznik A Portable, Network-Transparent Communication System for Message-Based Applications , 1986, ICDCS.

[61]  Amar Mukherjee Hardware Algorithms for Determining Similarity Between Two Strings , 1989, IEEE Trans. Computers.

[62]  A.R. Hurson,et al.  A VLSI join module , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume I: Architecture Track.

[63]  Roger L. Haskin Hardware for searching very large text databases , 1980, CAW '80.

[64]  George L. Abbott Optical Disk Technology. , 1987 .

[65]  David C. Roberts A specialized computer architecture for text retrieval , 1978 .

[66]  David K. Hsiao,et al.  Proceedings of the fifth workshop on Computer architecture for non-numeric processing , 1980 .

[67]  Stanley Y. W. Su,et al.  Database computers : principles, architectures, and techniques , 1988 .

[68]  D J Evans,et al.  Parallel processing , 1986 .

[69]  Amar Mukhopadhyay A fast algorithm for the longest-common-subsequence problem , 1980, Inf. Sci..

[70]  H. T. Kung,et al.  The Design of Special-Purpose VLSI Chips , 1980, Computer.

[71]  Witold Litwin,et al.  Messidor: A Distributed Information Retrieval Systems , 1982, SIGIR.

[72]  William Howard Stellhorn A specialized computer for information retrieval. , 1974 .

[73]  James A. Sprowl,et al.  Computer‐Assisted Legal Research—An Analysis of Full‐Text Document Retrieval Systems, Particularly the LEXIS System , 1976 .