Two textbase indexing methods enjoying wide applicability are the inverted index and the Superimposed Coding based Signature File (SC-SF). The former is most efficient in query processing, whereas the latter excels in storage utilization. Building on previous results, we propose a new hybrid structure (S-Index) which has a tunable performance. At the one extreme end, S-Index turns into a signature file with zero information loss, so that queries are processed faster than in ordinary SC-SF. At the other extreme end, S-Index turns into an inverted index. The advantage of the proposed access method is that the textbase index may now be tailored to the query profiles of user classes: for frequently queried textbase sections S-Index performs like an inverted index, whereas the bulk of the textbase is indexed in the form of a signature file. The S-Index structure is presented in detail, together with performance analysis results.
[1]
Uwe Deppisch,et al.
S-tree: a dynamic balanced signature index for office retrieval
,
1986,
SIGIR '86.
[2]
Johann Eder,et al.
Advances in Databases and Information Systems
,
1996,
Workshops in Computing.
[3]
Ricardo Baeza-Yates,et al.
Information Retrieval: Data Structures and Algorithms
,
1992
.
[4]
Yannis Manolopoulos,et al.
Perfect Encoding: a Signature Method for Text Retrieval
,
1996,
ADBIS.
[5]
Edward A. Fox,et al.
A faster algorithm for constructing minimal perfect hash functions
,
1992,
SIGIR '92.
[6]
Edward A. Fox,et al.
Inverted Files
,
1992,
Information Retrieval: Data Structures & Algorithms.
[7]
Christos Faloutsos,et al.
Access methods for text
,
1985,
CSUR.
[8]
Christos Faloutsos,et al.
Hybrid Index Organizations for Text Databases
,
1992,
EDBT.