Optimal Representation in Average Using Kolmogorov Complexity

Abstract One knows from the Algorithmic Complexity Theory 1 [2–5, 8, 14] that a word is incompressible on average. For words of pattern xm, it is natural to believe that providing x and m is an optimal average representation. On the contrary, for words like x ⊕ y (i.e., the bit to bit x or between x and y), providing x and y is not an optimal description on average. In this work, we sketch a theory of average optimal representation that formalizes natural ideas and operates where intuition does not suffice. First, we formulate a definition of K-optimality on average for a pattern, then demonstrate results that corroborate intuitive ideas, and give worthy insights into the best compression in more complex cases.

[1]  M Dauchet,et al.  Compression and genetic sequence analysis. , 1996, Biochimie.

[2]  H. P. Yockey,et al.  Information Theory And Molecular Biology , 1992 .

[3]  Andrew V. Goldberg,et al.  Compression and Ranking , 1991, SIAM J. Comput..

[4]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[5]  Jean-Paul Delahaye,et al.  Detection of significant patterns by compression algorithms: the case of approximate tandem repeats in DNA sequences , 1997, Comput. Appl. Biosci..

[6]  Osamu Watanabe,et al.  Kolmogorov Complexity and Computational Complexity , 2012, EATCS Monographs on Theoretical Computer Science.

[7]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[8]  L. Goddard Information Theory , 1962, Nature.

[9]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[10]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[11]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[12]  Cristian S. Calude Information and Randomness: An Algorithmic Perspective , 1994 .

[13]  J. Delahaye Information, complexité et hasard , 1994 .

[14]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[15]  James A. Storer,et al.  Data Compression: Methods and Theory , 1987 .

[16]  Max Dauchet,et al.  A first step toward chromosome analysis by compression algorithms , 1995, Proceedings First International Symposium on Intelligence in Neural and Biological Systems. INBS'95.

[17]  Claude E. Shannon,et al.  A Mathematical Theory of Communications , 1948 .