Block Graphs in Practice

Motivated by the rapidly increasing size of genomic databases, code repositories and versioned texts, several compression schemes have been proposed that work well on highly-repetitive strings and also support fast random access: e.g., LZ-End, RLZ, GDC, augmented SLPs, and block graphs. Block graphs have good worst-case bounds but it has been an open question whether they are practical. We describe an implementation of block graphs that, for several standard datasets, provides better compression and faster random access than competing schemes.

[1]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets , 2007, ACM Trans. Algorithms.

[2]  Gonzalo Navarro,et al.  On compressing and indexing repetitive sequences , 2013, Theor. Comput. Sci..

[3]  Justin Zobel,et al.  Relative Lempel-Ziv Compression of Genomes for Large-Scale Storage and Retrieval , 2010, SPIRE.

[4]  Elad Verbin,et al.  Data Structure Lower Bounds on Random Access to Grammar-Compressed Strings , 2013, CPM.

[5]  Simon J. Puglisi,et al.  Faster Approximate Pattern Matching in Compressed Repetitive Texts , 2011, ISAAC.

[6]  Roberto Grossi,et al.  Random Access to High-Order Entropy Compressed Text , 2013, Space-Efficient Data Structures, Streams, and Algorithms.

[7]  Wojciech Rytter,et al.  Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2002, Theor. Comput. Sci..

[8]  Szymon Grabowski,et al.  Robust relative compression of genomes with random access , 2011, Bioinform..

[9]  Kunihiko Sadakane,et al.  Practical Entropy-Compressed Rank/Select Dictionary , 2006, ALENEX.

[10]  Justin Zobel,et al.  Optimized Relative Lempel-Ziv Compression of Genomes , 2011, ACSC.

[11]  Hiroshi Sakamoto,et al.  Fully-Online Grammar Compression , 2013, SPIRE.

[12]  Gad M. Landau,et al.  Random access to grammar-compressed strings , 2010, SODA '11.

[13]  Abhi Shelat,et al.  The smallest grammar problem , 2005, IEEE Transactions on Information Theory.

[14]  Szymon Grabowski,et al.  Genome compression: a novel approach for large collections , 2013, Bioinform..