Bit-Sliced Signature Files for Very Large Text Databases an a Parallel Machine Architecture

Free text retrieval is an important problem which can significantly benefit from a parallel architecture. Signature methods have been proposed to answer text retrieval queries in parallel machines [Sta88, LF92], under the assumption that the main memory is sufficient to hold the entire signature file. We propose the use of a Parallel Bit-Sliced Signature File method on a SIMD machine architecture when the size of the signature file exceeds the available memory. We propose that we need not examine all the bit slices; instead we use a partial fetch slice swapping algorithm. This method achieves graceful performance degradation according to the database size. We provide formulae for the optimal number of signature slices to fetch and match with the query signature. Arithmetic examples show that our method can handle a 128GB database with a 2sec response time on a machine with the characteristics of the Connection Machine.

[1]  Christos Faloutsos,et al.  Hybrid Index Organizations for Text Databases , 1992, EDBT.

[2]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[3]  Christos Faloutsos,et al.  Design Considerations for a Message File Server , 1984, IEEE Transactions on Software Engineering.

[4]  Stavros Christodoulakis,et al.  Message files , 1982, TOIS.

[5]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[6]  Harold S. Stone,et al.  Parallel Querying of Large Databases: A Case Study , 1987, Computer.

[7]  Christos Faloutsos,et al.  Fast Text Access Methods for Optical and Large Magnetic Disks: Designs and Performance Comparison , 1988, VLDB.

[8]  Dik Lun Lee,et al.  Partitioned signature files: design issues and performance evaluation , 1989, TOIS.

[9]  Christos Faloutsos,et al.  Description and performance analysis of signature file methods for office filing , 1987, TOIS.

[10]  Roger L. Haskin,et al.  Special-Purpose Processors for Text Retrieval. , 1981 .

[11]  FaloutsosChristos,et al.  Description and performance analysis of signature file methods for office filing , 1987 .

[12]  Kotagiri Ramamohanarao,et al.  A two level superimposed coding scheme for partial match retrieval , 1983, Inf. Syst..

[13]  Zheng Lin CAT: an execution model for concurrent full text search , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[14]  Simon Stiassny Mathematical analysis of various superimposed coding methods , 1960 .

[15]  Zheng Lin,et al.  Frame-Sliced Signature Files , 1992, IEEE Trans. Knowl. Data Eng..