Compression and machine learning: a new perspective on feature space vectors

The use of compression algorithms in machine learning tasks such as clustering and classification has appeared in a variety of fields, sometimes with the promise of reducing problems of explicit feature selection. The theoretical justification for such methods has been founded on an upper bound on Kolmogorov complexity and an idealized information space. An alternate view shows compression algorithms implicitly map strings into implicit feature space vectors, and compression-based similarity measures compute similarity within these feature spaces. Thus, compression-based methods are not a "parameter free" magic bullet for feature selection and data representation, but are instead concrete similarity measures within defined feature spaces, and are therefore akin to explicit feature vector models used in standard machine learning algorithms. To underscore this point, we find theoretical and empirical connections between traditional machine learning vector models and compression, encouraging cross-fertilization in future work

[1]  William I. Gasarch,et al.  Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.

[2]  Ronald de Wolf,et al.  Algorithmic Clustering of Music Based on String Compression , 2004, Computer Music Journal.

[3]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[4]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[5]  G. Blelloch Introduction to Data Compression * , 2022 .

[6]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[7]  William John Teahan,et al.  A repetition based measure for verification of text collections and for text categorization , 2003, SIGIR.

[8]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[9]  Carla E. Brodley,et al.  An Empirical Study of Two Approaches to Sequence Learning for Anomaly Detection , 2003, Machine Learning.

[10]  Ian H. Witten,et al.  Text mining: a new frontier for lossless compression , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[11]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[12]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[13]  Xin Chen,et al.  A compression algorithm for DNA sequences and its applications in genome comparison , 2000, RECOMB '00.

[14]  Darrel Hankerson,et al.  Introduction to Information Theory and Data Compression , 2003 .

[15]  Mark R. Nelson,et al.  LZW data compression , 1989 .

[16]  Khalid Sayood,et al.  Introduction to data compression (2nd ed.) , 2000 .

[17]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[18]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[19]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[20]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[21]  A. Bratko,et al.  Spam Filtering Using Compression Models , 2005 .

[22]  Vittorio Loreto,et al.  Language trees and zipping. , 2002, Physical review letters.

[23]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[24]  Ian H. Witten,et al.  Text categorization using compression models , 2000, Proceedings DCC 2000. Data Compression Conference.

[25]  Zaher Dawy,et al.  Genomic analysis using methods from information theory , 2004, Information Theory Workshop.

[26]  Peter Grünwald,et al.  A tutorial introduction to the minimum description length principle , 2004, ArXiv.

[27]  Xin Chen,et al.  Shared information and program plagiarism detection , 2004, IEEE Transactions on Information Theory.

[28]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .