Surrogate subsets: a free space management strategy for the index of a text retrieval system

This paper presents a new data structure and an associated strategy to be utilized by indexing facilities for text retrieval systems. The paper starts by reviewing some of the goals that may be considered when designing such an index and continues with a small survey of various current strategies. It then presents an indexing strategy referred to as surrogate subsets discussing its appropriateness in the light of the specified goals. Various design issues and implementation details are discussed. Our strategy requires that a surrogate file be divided into a large number of subsets separated by free space which will allow the index to expand when new material is appended to the database. Experimental results report on the utilization of free space when the database is enlarged.

[1]  Christos Faloutsos,et al.  Optimal signature extraction and information loss , 1987, TODS.

[2]  Gerard Salton,et al.  Another look at automatic text-retrieval systems , 1986, CACM.

[3]  Christos Faloutsos,et al.  Design and Performance Considerations for an Optical Disk-Based, Multimedia Object Server , 1986, Computer.

[4]  Soon Myoung Chung,et al.  Computer Architecture for a Surrogate File to a Very Large Data/Knowledge Base , 1987, Computer.

[5]  Toby J. Teorey,et al.  Design of Database Structures , 1982 .

[6]  Julius T. Tou,et al.  Information Systems , 1973, GI Jahrestagung.

[7]  Christos Faloutsos,et al.  Access methods for text , 1985, CSUR.

[8]  Per-Åke Larson,et al.  A Method for Speeding Up Text Retrieval , 1983, Databases for Business and Office Applications.

[9]  Samuel DeFazio,et al.  The Mead Information Retrieval System , 1988, Digest of Papers. COMPCON Spring 88 Thirty-Third IEEE Computer Society International Conference.

[10]  Craig Stanfill,et al.  Parallel free-text search on the connection machine system , 1986, CACM.

[11]  C.S. Roberts,et al.  Partial-match retrieval via the method of superimposed codes , 1979, Proceedings of the IEEE.

[12]  Christos Faloutsos,et al.  Signature files: an access method for documents and its analytical performance evaluation , 1984, TOIS.

[13]  Stavros Christodoulakis,et al.  Message files , 1982, TOIS.

[14]  Roger L. Haskin,et al.  Special-Purpose Processors for Text Retrieval. , 1981 .

[15]  Christos Faloutsos,et al.  Design Considerations for a Message File Server , 1984, IEEE Transactions on Software Engineering.

[16]  N. S. Barnett,et al.  Private communication , 1969 .