Multikey access methods based on term discrimination and signature clustering

In order to improve the two-level signature file method designed by Sacks-Davis et al. [20], we propose new multikey access methods based on term discrimination and signature clustering. By term discrimination, we create separate, efficient access methods for the terms frequently used in user queries. We in addition cluster similar signatures by means of these terms so that we may achieve good performance on retrieval. Meanwhile we provide the space-time analysis of the proposed methods and compare them with the two-level signature file method. We show that the proposed methods achieve 15-30% savings in retrieval time and require 3-9 % more storage overhead.

[1]  Christos Faloutsos,et al.  Design Considerations for a Message File Server , 1984, IEEE Transactions on Software Engineering.

[2]  Ian A. Macleod,et al.  The Array Model: A conceptual modeling approach to document retrieval , 1987, J. Am. Soc. Inf. Sci..

[3]  Roger L. Haskin,et al.  On extending the functions of a relational database system , 1982, SIGMOD '82.

[4]  Soon Myoung Chung,et al.  Computer Architecture for a Surrogate File to a Very Large Data/Knowledge Base , 1987, Computer.

[5]  Jae-Woo Chang,et al.  Multikey Access Scheme Based on Term Discrimination and Signature Clustering , 1989, DASFAA.

[6]  Kotagiri Ramamohanarao,et al.  A Superimposed Codeword Indexing Scheme for Very Large Prolog Databases , 1986, ICLP.

[7]  Harry D. Huskey,et al.  An information retrieval system based on superimposed coding , 1899, AFIPS '69 (Fall).

[8]  Robert M. Colomb,et al.  A Clause Indexing system for PROLOG based on Superimposed Coding , 1986, Aust. Comput. J..

[9]  Christos Faloutsos,et al.  Design of a Signature File Method that Accounts for Non-Uniform Occurrence and Query Frequencies , 1985, VLDB.

[10]  C.S. Roberts,et al.  Partial-match retrieval via the method of superimposed codes , 1979, Proceedings of the IEEE.

[11]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[12]  Kotagiri Ramamohanarao,et al.  A Superimposed Coding Scheme Based on Multiple Block Descriptor Files for Indexing Very Large Data Bases , 1988, VLDB.

[13]  Christos Faloutsos,et al.  Fast Text Access Methods for Optical and Large Magnetic Disks: Designs and Performance Comparison , 1988, VLDB.

[14]  Christos Faloutsos,et al.  Description and performance analysis of signature file methods for office filing , 1987, TOIS.

[15]  Christos Faloutsos,et al.  Access methods for text , 1985, CSUR.

[16]  Harry D. Huskey,et al.  An information retrieval system based on superimposed coding , 1969, AFIPS '69 (Fall).

[17]  Kotagiri Ramamohanarao,et al.  Multikey access methods based on superimposed coding techniques , 1987, TODS.

[18]  Kotagiri Ramamohanarao,et al.  A two level superimposed coding scheme for partial match retrieval , 1983, Inf. Syst..

[19]  Bipin C. Desai,et al.  A data model for use with formatted and textual data , 1986, J. Am. Soc. Inf. Sci..

[20]  Donald E. Knuth,et al.  The Art of Computer Programming, Vol. 3: Sorting and Searching , 1974 .

[21]  Ron Sacks-Davis,et al.  Performance of a multi-key access method based on descriptors and superimposed coding techniques , 1985, Inf. Syst..

[22]  Christos Faloutsos,et al.  Signature files: design and performance comparison of some signature extraction methods , 1985, SIGMOD Conference.