Design of a Signature File Method that Accounts for Non-Uniform Occurrence and Query Frequencies

In this paper we study a variation of the signature file access method for text and attribute retrieval. According to this method, the documents (or records) are stored sequentially in the "text file". Abstractions ("signatures") of the documents (or records) are stored in the "signature file". The latter serves as a filter on retrieval: It helps discarding a large number of nonqualifying documents. We pro-pose a signature extraction method that takes into account the query and occurrence frequencies, thus achieving better performance. The model we present is general enough, so that results can be applied not only for text retrieval but also for files with formatted data.

[1]  Calvin N. Mooers,et al.  Application of random codes to the gathering of statistical information , 1948 .

[2]  Roger L. Haskin,et al.  On extending the functions of a relational database system , 1982, SIGMOD '82.

[3]  Christos Faloutsos,et al.  Signature files: design and performance comparison of some signature extraction methods , 1985, SIGMOD Conference.

[4]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[5]  Roger L. Haskin,et al.  Architecture and Operation of a Large, Full-Text Information-Retrieval System , 1983, Advanced Database Machine Architecture.

[6]  Chris M. Gravina National Westminster Bank Mass Storage Archiving , 1978, IBM Syst. J..

[7]  C.S. Roberts,et al.  Partial-match retrieval via the method of superimposed codes , 1979, Proceedings of the IEEE.

[8]  Richard A. Gustafson Elements of the randomized combinatorial file structure , 1971, SIGIR '71.

[9]  Christos Faloutsos,et al.  A Multimedia Office Filing System , 1983, VLDB.

[10]  Ian A. Macleod A data base management system for document retrieval applications , 1981, Inf. Syst..

[11]  Charles S. Roberts,et al.  Partial-Match Via the Method of Superimposed Codes , 1979, Proc. IEEE.

[12]  Roger L. Haskin,et al.  Special-Purpose Processors for Text Retrieval. , 1981 .

[13]  Stavros Christodoulakis,et al.  Message files , 1982, TOIS.

[14]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[15]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[16]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[17]  Fausto Rabitti,et al.  Evaluation of Access Methods to Text Document in Office Systems , 1984, SIGIR.

[18]  Christos Faloutsos,et al.  Design Considerations for a Message File Server , 1984, IEEE Transactions on Software Engineering.

[19]  Lee A. Hollaar,et al.  Text Retrieval Computers , 1979, Computer.

[20]  Stavros Christodoulakis,et al.  Access Files for Batching Queries in Large Information Systems , 1983, ICOD.