Improvement of Text Compression Parameters Using Cluster Analysis

Several actions are usually performed when document is ap- pended to textual database in information retrieval system. The most frequent actions are compression of the document and cluster analysis of the textual database to improve quality of answers to users' queries. The information retrieved from the clustering can be very helpful in compres- sion. Word-based compression using information about cluster hierarchy is presented in this paper. Some experimental results are provided at the end of the paper.

[1]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[2]  Václav Snásel,et al.  Word-based compression methods for large text documents , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[3]  Jan Martinovic,et al.  Vector model improvement by FCA and Topic Evolution , 2005, DATESO.

[4]  J. Davenport Editor , 1960 .

[5]  Roger L. Haskin,et al.  Special-Purpose Processors for Text Retrieval. , 1981 .

[6]  Fionn Murtagh,et al.  On Ultrametricity, Data Coding, and Computation , 2004, J. Classif..

[7]  Michal Zemlicka,et al.  Text Compression: Syllables , 2005, DATESO.

[8]  Robert E. Tarjan,et al.  A Locally Adaptive Data , 1986 .

[9]  Jerzy Dydak,et al.  Dimension zero at all scales , 2006 .

[10]  Alistair Moffat,et al.  Word‐based text compression , 1989, Softw. Pract. Exp..

[11]  Ricardo A. Baeza-Yates,et al.  Fast searching on compressed text allowing errors , 1998, SIGIR '98.

[12]  Václav Snásel,et al.  Query Expansion and Evolution of Topic in Information Retrieval Systems , 2004, DATESO.

[13]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[14]  Jaroslav Pokornýa Word-based Compression Methods with Empty Words and Nonwords for Text Retrieval Systems , 2007 .

[15]  Alistair Moffat,et al.  Text Compression for Dynamic Document Databases , 1997, IEEE Trans. Knowl. Data Eng..

[16]  Václav Snásel,et al.  Word-Based Compression Methods and Indexing for Text Retrieval Systems , 1999, ADBIS.

[17]  R. Nigel Horspool,et al.  Constructing word-based text compression algorithms , 1992, Data Compression Conference, 1992..

[18]  Alistair Moffat,et al.  Adding compression to a full‐text retrieval system , 1995, Softw. Pract. Exp..