Malleable Coding: Compressed Palimpsests

A malleable coding scheme considers not only compression efficiency but also the ease of alteration, thus encouraging some form of recycling of an old compressed version in the formation of a new one. Malleability cost is the difficulty of synchronizing compressed versions, and malleable codes are of particular interest when representing information and modifying the representation are both expensive. We examine the trade-off between compression efficiency and malleability cost under a malleability metric defined with respect to a string edit distance. This problem introduces a metric topology to the compressed domain. We characterize the achievable rates and malleability as the solution of a subgraph isomorphism problem. This can be used to argue that allowing conditional entropy of the edited message given the original message to grow linearly with block length creates an exponential increase in code length.

[1]  Vivek K Goyal,et al.  Ordered and Disordered Source Coding , 2006 .

[2]  H. S. WITSENHAUSEN,et al.  The zero-error side information problem and chromatic numbers (Corresp.) , 1976, IEEE Trans. Inf. Theory.

[3]  R. Swanson A unifying concept for the amino acid code. , 1984, Bulletin of mathematical biology.

[4]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[5]  Sanjeev R. Kulkarni,et al.  Source codes as random number generators , 1997, Proceedings of IEEE International Symposium on Information Theory.

[6]  Yaron Minsky,et al.  Set reconciliation with nearly optimal communication complexity , 2003, IEEE Trans. Inf. Theory.

[7]  Bruce E. Hajek,et al.  Comments on "Bit-interleaved coded modulation" , 2006, IEEE Trans. Inf. Theory.

[8]  Michael B. Pursley,et al.  Efficient universal noiseless source codes , 1981, IEEE Trans. Inf. Theory.

[9]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[10]  Lav R. Varshney,et al.  Optimal Information Storage in Noisy Synapses under Resource Constraints , 2006, Neuron.

[11]  Giuseppe F. Italiano,et al.  A new approach to dynamic all pairs shortest paths , 2004, JACM.

[12]  Giuseppe F. Italiano,et al.  A new approach to dynamic all pairs shortest paths , 2003, STOC '03.

[13]  Randal C. Burns,et al.  In-Place Reconstruction of Version Differences , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Rudolf Ahlswede,et al.  On multiuser write-efficient memories , 1994, IEEE Trans. Inf. Theory.

[15]  Don H. Johnson,et al.  Toward a theory of information processing , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[16]  Torsten Suel,et al.  Algorithms for Delta Compression and Remote File Synchronization , 2003 .

[17]  Jonathan L. Gross,et al.  Topological Graph Theory , 1987, Handbook of Graph Theory.

[18]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[19]  Sergio Verdú,et al.  On channel capacity per unit cost , 1990, IEEE Trans. Inf. Theory.

[20]  Leon Gordon Kraft,et al.  A device for quantizing, grouping, and coding amplitude-modulated pulses , 1949 .

[21]  Ian Pratt,et al.  Proceedings of the General Track: 2004 USENIX Annual Technical Conference , 2004 .

[22]  Noga Alon,et al.  Source coding and graph entropies , 1996, IEEE Trans. Inf. Theory.

[23]  Robert M. Losee,et al.  A Gray Code Based Ordering for Documents on Shelves: Classification for Browsing and Retrieval , 1992, J. Am. Soc. Inf. Sci..

[24]  M. Livingston,et al.  Embeddings in hypercubes , 1988 .

[25]  Nathan Linial,et al.  The geometry of graphs and some of its algorithmic applications , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[26]  Péter Gács,et al.  Information Distance , 1998, IEEE Trans. Inf. Theory.

[27]  Ran Raz,et al.  Lower Bounds on the Distortion of Embedding Finite Metric Spaces in Graphs , 1998, Discret. Comput. Geom..

[28]  Allen Gersho,et al.  Pseudo-Gray coding , 1990, IEEE Trans. Commun..

[29]  Horst Bunke,et al.  Error Correcting Graph Matching: On the Influence of the Underlying Cost Function , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Erik G. Ström,et al.  On the optimality of the binary reflected Gray code , 2004, IEEE Transactions on Information Theory.

[31]  Lee D. Davisson,et al.  Universal noiseless coding , 1973, IEEE Trans. Inf. Theory.

[32]  Rudolf Ahlswede Identification Entropy , 2005, Electron. Notes Discret. Math..

[33]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[34]  Shimon Edelman,et al.  Representation and recognition in vision , 1999 .

[35]  Fangwei Fu,et al.  On the expectation and variance of hamming distance between two i.i.d random vectors , 1997 .

[36]  Flemming Topsøe,et al.  Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.

[37]  Horst Bunke,et al.  A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  H. Wiener Structural determination of paraffin boiling points. , 1947, Journal of the American Chemical Society.

[39]  E. Gilbert Gray codes and paths on the N-cube , 1958 .

[40]  Edgar N. Gilbert,et al.  Codes based on inaccurate source probabilities , 1971, IEEE Trans. Inf. Theory.

[41]  Nigel J. Newton,et al.  Information and Entropy Flow in the Kalman–Bucy Filter , 2005 .

[42]  Tsvi Tlusty,et al.  A model for the emergence of the genetic code as a transition in a noisy information channel , 2007, Journal of theoretical biology.

[43]  Alon Orlitsky Interactive Communication of Balanced Distributions and of Correlated Files , 1993, SIAM J. Discret. Math..

[44]  符方伟,et al.  ON THE EXPECTATION AND VARIANCE OF HAMMING DISTANCE BETWEEN TWO I.I.D RANDOM VECTORS , 1997 .

[45]  Drew Endy,et al.  Synthetic genomics | options for governance. , 2007, Biosecurity and bioterrorism : biodefense strategy, practice, and science.

[46]  William Equitz,et al.  Successive refinement of information , 1991, IEEE Trans. Inf. Theory.

[47]  Giuseppe Longo,et al.  An application of informational divergence to Huffman codes , 1982, IEEE Trans. Inf. Theory.

[48]  Giuseppe F. Italiano,et al.  Incremental algorithms for minimal length paths , 1991, SODA '90.

[49]  Stefan Richter,et al.  Centrality Indices , 2004, Network Analysis.

[50]  Alon Orlitsky,et al.  Zero-Error Information Theory , 1998, IEEE Trans. Inf. Theory.

[51]  Kannan Ramchandran,et al.  A Graph-Based Framework for Transmission of Correlated Sources Over Multiple-Access Channels , 2007, IEEE Transactions on Information Theory.

[52]  R. Ahlswede,et al.  Coding for Write-Efficient Memory , 1989, Inf. Comput..

[53]  Arnold L. Rosenberg,et al.  Graph Separators, with Applications , 2001, Frontiers of Computer Science.

[54]  Naresh R. Shanbhag,et al.  Information-theoretic bounds on average signal transition activity [VLSI systems] , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[55]  Giuseppe Caire,et al.  Bit-Interleaved Coded Modulation , 2008, Found. Trends Commun. Inf. Theory.

[56]  Raymond W. Yeung,et al.  A First Course in Information Theory , 2002 .

[57]  Claude E. Shannon,et al.  The zero error capacity of a noisy channel , 1956, IRE Trans. Inf. Theory.

[58]  Pak Chung Wong,et al.  Organic data memory using the DNA approach , 2003, CACM.

[59]  Graham Cormode,et al.  Sequence distance embeddings , 2003 .

[60]  Frank Harary,et al.  Graph Theory , 2016 .

[61]  Suresh Jagannathan,et al.  Improving duplicate elimination in storage systems , 2006, TOS.