Design and performance evaluation of signature files for text retrieval systems

Signature files can be used to support information retrieval in library systems, artificial intelligence, and office automation systems. However, signature file techniques require exhaustively searching the signature files themselves. In addition, data may be retrieved by mistake because of the information loss during signature extraction. The aim of this study was to construct an efficient structure of signature file, called partitioned signature file, so that a large number of unqualified signatures were eliminated from searching. Also, an optimal weighting strategy was proposed to hash each term to its corresponding signature with a specific weight to minimize the false drop probability. The partitioned signature file was able to reduce the number of signatures searched by 10% to 80% which highly depended on parameter values. The false drop rate of the new weighting strategy was only 30% to 50% of the uniform weighting strategy. All these new designs can be used together to enhance the performance.