Design and evaluation of multikey access methods using signature files

Abstract In this paper we propose new multikey access methods using signature files for efficiently handling both formatted and text data. To achieve this, we design the multikey access methods based on term discrimination and signature clustering, consequently making a great improvement on the two-level signature file method proposed by Sacks-Davis et al. By term discrimination, we create separate, efficient access methods for discriminatory terms which are frequently used in user queries. In addition, we cluster similar signatures by means of the similarity of the discriminatory terms so that we may achieve better performance on retrieval. Meanwhile we provide the space-time analysis of new multikey access methods and compare them with the two-level signature file method. We show that new multikey access methods achieve about 20–45% savings in retrieval time, but require 4–11% more storage overhead.

[1]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[2]  Bipin C. Desai,et al.  A Data Model for Use with Formatted and Textual Data. , 1986 .

[3]  Kotagiri Ramamohanarao,et al.  A Superimposed Codeword Indexing Scheme for Very Large Prolog Databases , 1986, ICLP.

[4]  Robert M. Colomb,et al.  A Clause Indexing system for PROLOG based on Superimposed Coding , 1986, Aust. Comput. J..

[5]  C.S. Roberts,et al.  Partial-match retrieval via the method of superimposed codes , 1979, Proceedings of the IEEE.

[6]  Jae-Woo Chang,et al.  Multikey Access Scheme Based on Term Discrimination and Signature Clustering , 1989, DASFAA.

[7]  Christos Faloutsos,et al.  Design Considerations for a Message File Server , 1984, IEEE Transactions on Software Engineering.

[8]  Harry D. Huskey,et al.  An information retrieval system based on superimposed coding , 1969, AFIPS '69 (Fall).

[9]  John L. Pfaltz,et al.  Partial-match retrieval using indexed descriptor files , 1980, CACM.

[10]  Kotagiri Ramamohanarao,et al.  Multikey access methods based on superimposed coding techniques , 1987, TODS.

[11]  Christos Faloutsos,et al.  Description and performance analysis of signature file methods for office filing , 1987, TOIS.

[12]  Soon Myoung Chung,et al.  Computer Architecture for a Surrogate File to a Very Large Data/Knowledge Base , 1987, Computer.

[13]  Krzysztof R. Apt,et al.  Logic Programming , 1990, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[14]  Christos Faloutsos,et al.  Signature files: design and performance comparison of some signature extraction methods , 1985, SIGMOD Conference.

[15]  Christos Faloutsos,et al.  Design of a Signature File Method that Accounts for Non-Uniform Occurrence and Query Frequencies , 1985, VLDB.

[16]  Kotagiri Ramamohanarao,et al.  A Superimposed Coding Scheme Based on Multiple Block Descriptor Files for Indexing Very Large Data Bases , 1988, VLDB.

[17]  Christos Faloutsos,et al.  Access methods for text , 1985, CSUR.

[18]  Christos Faloutsos,et al.  Fast Text Access Methods for Optical and Large Magnetic Disks: Designs and Performance Comparison , 1988, VLDB.

[19]  Kotagiri Ramamohanarao,et al.  A two level superimposed coding scheme for partial match retrieval , 1983, Inf. Syst..

[20]  Ron Sacks-Davis,et al.  Performance of a multi-key access method based on descriptors and superimposed coding techniques , 1985, Inf. Syst..