Massive Parallelism on the Hybrid Text-Retrieval Machine

The design of a high-performance, cost-effective, machine for retrieving textual data is discussed in this paper. High performance and cost effectiveness are achieved by a combination of low-cost hard disks, software filtering techniques, and a large amount of main memory. The discussion focuses on the signature processor, which is based on the partitioned signature file technique, and the mass storage system, which is based on a disk array. A performance evaluation on the individual system components, namely, the signature processor and the mass storage system, as well as the entire system is presented.

[1]  Dik Lun Lee,et al.  Document ranking on weight-partitioned signature files , 1996, TOIS.

[2]  Craig Stanfill,et al.  Parallel free-text search on the connection machine system , 1986, CACM.

[3]  D. A. Morris Processor matches text at high speeds , 1983 .

[4]  Harold S. Stone,et al.  Parallel Querying of Large Databases: A Case Study , 1987, Computer.

[5]  Jun-ichi Aoe Computer Algorithms: String Pattern Matching Strategies , 1994 .

[6]  Pavel Zezula,et al.  Dynamic partitioning of signature files , 1991, TOIS.

[7]  Dik Lun Lee,et al.  HYTREM - A Hybrid Text-Retrieval Machine for Large Databases , 1990, IEEE Trans. Computers.

[8]  Dik Lun Lee,et al.  Partitioned signature files: design issues and performance evaluation , 1989, TOIS.

[9]  Hector Garcia-Molina,et al.  Disk striping , 1986, 1986 IEEE Second International Conference on Data Engineering.

[10]  Christos Faloutsos,et al.  Description and performance analysis of signature file methods for office filing , 1987, TOIS.

[11]  Dik Lun Lee,et al.  An analysis of performance and cost factors in searching large text databases using parallel search systems , 1994 .

[12]  William H. Stellhorn,et al.  An Inverted File Processor for Information Retrieval , 1977, IEEE Transactions on Computers.

[13]  Dik Lun Lee,et al.  Signature file methods for implementing a ranking strategy , 1990, Inf. Process. Manag..

[14]  Dik Lun Lee,et al.  A partitioned signature file structure for multiattribute and text retrieval , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[15]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[16]  Edie M. Rasmussen,et al.  Introduction: Parallel processing and information retrieval , 1991, Inf. Process. Manag..

[17]  John L. Pfaltz,et al.  Partial-match retrieval using indexed descriptor files , 1980, CACM.

[18]  W. Bruce Croft,et al.  Implementing ranking strategies using text signatures , 1988, TOIS.

[19]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[20]  S. F. Reddaway High speed text retrieval from large databases on a massively parallel processor , 1991, Inf. Process. Manag..

[21]  Uwe Deppisch,et al.  S-tree: a dynamic balanced signature index for office retrieval , 1986, SIGIR '86.

[22]  Walter A. Burkhard,et al.  RAID organization and performance , 1992, [1992] Proceedings of the 12th International Conference on Distributed Computing Systems.

[23]  Kenneth C. Smith,et al.  RAP: an associative processor for data base management , 1975, AFIPS '75.

[24]  Gerard Salton,et al.  Parallel text search methods , 1988, CACM.

[25]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[26]  Herb Schwetman,et al.  Introduction to process-oriented simulation and CSIM , 1990, 1990 Winter Simulation Conference Proceedings.

[27]  Sudhir Ahuja,et al.  An associative/parallel processor for partial match retrieval using superimposed codes , 1980, ISCA '80.

[28]  G. Jack Lipovski,et al.  The Architectural Features and Implementation Techniques of the Multicell CASSM , 1979, IEEE Transactions on Computers.

[29]  Dik Lun Lee A word-parallel, bit-serial signature processor for superimposed coding , 1986, 1986 IEEE Second International Conference on Data Engineering.