Compressed Communication Complexity of Hamming Distance

We consider the communication complexity of the Hamming distance of two strings. Bille et al. [SPIRE 2018] considered the communication complexity of the longest common prefix (LCP) problem in the setting where the two parties have their strings in a compressed form, i.e., represented by the Lempel-Ziv 77 factorization (LZ77) with/without self-references. We present a randomized public-coin protocol for a joint computation of the Hamming distance of two strings represented by LZ77 without self-references. Although our scheme is heavily based on Bille et al.’s LCP protocol, our complexity analysis is original which uses Crochemore’s C-factorization and Rytter’s AVL-grammar. As a byproduct, we also show that LZ77 with/without self-references are not monotonic in the sense that their sizes can increase by a factor of 4/3 when a prefix of the string is removed.

[1]  RytterWojciech Application of Lempel--Ziv factorization to the approximation of grammar-based compression , 2003 .

[2]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[3]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[4]  Jakub Radoszewski,et al.  Streaming k-mismatch with error correcting and applications , 2020, Inf. Comput..

[5]  Ely Porat,et al.  Space lower bounds for online pattern matching , 2013, Theor. Comput. Sci..

[6]  Andrew Chi-Chih Yao,et al.  Some complexity questions related to distributive computing(Preliminary Report) , 1979, STOC.

[7]  M. AdelsonVelskii,et al.  AN ALGORITHM FOR THE ORGANIZATION OF INFORMATION , 1963 .

[8]  Max Chochemore Linear searching for a square in a word , 1984, Bull. EATCS.

[9]  Dominik Kempa,et al.  At the roots of dictionary compression: string attractors , 2017, STOC.

[10]  Pawel Gawrychowski,et al.  Streaming Dictionary Matching with Mismatches , 2018, Algorithmica.

[11]  Donald Ervin Knuth,et al.  The art of computer programming, , Volume III, 2nd Edition , 1998 .

[12]  Antonio Restivo,et al.  A combinatorial view on string attractors , 2021, Theor. Comput. Sci..

[13]  Guillaume Lagarde,et al.  Lempel-Ziv: a "one-bit catastrophe" but not a tragedy , 2018, SODA.

[14]  Wojciech Rytter Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2003, Theor. Comput. Sci..

[15]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[16]  Jakub Radoszewski,et al.  Quasi-Periodicity in Streams , 2019, CPM.

[17]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[18]  Philip Bille,et al.  Compressed Communication Complexity of Longest Common Prefixes , 2018, SPIRE.

[19]  Markus Jalsenius,et al.  Parameterized Matching in the Streaming Model , 2013, STACS.

[20]  G. Navarro,et al.  Towards a Definitive Measure of Repetitiveness , 2019, LATIN.