A search scheme resulting in mixing compressed text files

The direct search of text files is a very useful technique. That not only reduces the amount of storage space required for a text file, but it also speeds up the search time. Furthermore, protecting secret documents is a basic and important requirement of computer systems. In this paper, we present a more secure compression and decompression technique for large natural language texts. The merits of our method are: (1) The word-based approximate matching process can be performed directly on the secure compressed text directly; (2) updating a word directly into the compressed text is supported; (3) the decompression process can be started at the position of the search results; (4) the search process does not require complex encryption computation. The scheme we present is simple. The complexity of the search phase is only O(n), which makes our scheme very practical. We believe that this technique has a great potential for its applications to be extended to solve other problems, for example, the private information retrieval problem, encrypted databases and so on.

[1]  Udi Manber A text compression scheme that allows fast searching directly in the compressed file , 1997, TOIS.

[2]  Chin-Chen Chang A Composite Perfect Hashing Scheme for Large Letter-Oriented Key Sets , 1991, J. Inf. Sci. Eng..

[3]  Rafail Ostrovsky,et al.  Replication is not needed: single database, computationally-private information retrieval , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[4]  염흥렬,et al.  [서평]「Applied Cryptography」 , 1997 .

[5]  H. S. Heaps,et al.  Information retrieval, computational and theoretical aspects , 1978 .

[6]  Ayumi Shinohara,et al.  Shift-And Approach to Pattern Matching in LZW Compressed Text , 1999, CPM.

[7]  Gaston H. Gonnet,et al.  A new approach to text searching , 1989, SIGIR '89.

[8]  Gary Benson,et al.  Let sleeping files lie: pattern matching in Z-compressed files , 1994, SODA '94.

[9]  Ayumi Shinohara,et al.  Multiple pattern matching in LZW compressed text , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[10]  Ayumi Shinohara,et al.  A unifying framework for compressed pattern matching , 1999, 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268).

[11]  Mikkel Thorup,et al.  String Matching in Lempel—Ziv Compressed Strings , 1998, Algorithmica.

[12]  Gonzalo Navarro,et al.  A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text , 1999, CPM.

[13]  Dawn Xiaodong Song,et al.  Practical techniques for searches on encrypted data , 2000, Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000.

[14]  Ricardo A. Baeza-Yates,et al.  Fast and flexible word searching on compressed text , 2000, TOIS.

[15]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[16]  Meng He,et al.  Indexing Compressed Text , 2003 .

[17]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[18]  Eyal Kushilevitz,et al.  Private information retrieval , 1998, JACM.

[19]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[20]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[21]  John B. Kam,et al.  A database encryption system with subkeys , 1981, TODS.

[22]  Niv Gilboa,et al.  Computationally private information retrieval (extended abstract) , 1997, STOC '97.

[23]  Gary Benson,et al.  Efficient two-dimensional compressed matching , 1992, Data Compression Conference, 1992..

[24]  Douglas R. Stinson,et al.  Cryptography: Theory and Practice , 1995 .