Perfect Encoding: a Signature Method for Text Retrieval

A new methodology is introduced, where blocks of text are replaced by a compressed, fully reversible, signature pattern. Full reversibility implies zero information loss, thus the new method is termed Perfect Encoding. The method’s analytical model is produced and, where applicable, contrasted with the current practice in signature file organizations. Analysis results indicate that it comprises a potential candidacy for information retrieval implementations. In particular, perfect encoding has the potential to develop into an alternative or complementary scheme to inverted or signature file based systems.

[1]  Christos Faloutsos,et al.  Design Considerations for a Message File Server , 1984, IEEE Transactions on Software Engineering.

[2]  Jae-Woo Chang,et al.  HPSF: a horizontally-divided parallel signature file method , 1995, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing.

[3]  Johann Eder,et al.  Advances in Databases and Information Systems , 1996, Workshops in Computing.

[4]  Hiroyuki Kitagawa,et al.  Evaluation of signature files as set access facilities in OODBs , 1993, SIGMOD '93.

[5]  Edward A. Fox,et al.  Inverted Files , 1992, Information Retrieval: Data Structures & Algorithms.

[6]  Edward A. Fox,et al.  A faster algorithm for constructing minimal perfect hash functions , 1992, SIGIR '92.

[7]  Pavel Zezula,et al.  Hamming Filters: A Dynamic Signature File Organization for Parallel Stores , 1993, VLDB.

[8]  Godfrey Dewey,et al.  Relativ frequency of English speech sounds , 1923 .

[9]  Kotagiri Ramamohanarao,et al.  Atlas: A Nested Relational Database System for Text Applications , 1995, IEEE Trans. Knowl. Data Eng..

[10]  Kotagiri Ramamohanarao,et al.  A two level superimposed coding scheme for partial match retrieval , 1983, Inf. Syst..

[11]  Stavros Christodoulakis,et al.  Message files , 1982, TOIS.

[12]  Yannis Manolopoulos,et al.  Comparison of Signature File Models with Superimposed Coding , 1998, Inf. Process. Lett..

[13]  Dik Lun Lee Massive Parallelism on the Hybrid Text-Retrieval Machine , 1995, Inf. Process. Manag..

[14]  Christos Faloutsos,et al.  Access methods for text , 1985, CSUR.

[15]  Dik Lun Lee,et al.  Efficient Signature File Methods for Text Retrieval , 1995, IEEE Trans. Knowl. Data Eng..