Two-Dimensional Block Trees

The Block Tree (BT) is a novel compact data structure designed to compress sequence collections. It obtains compression ratios close to Lempel-Ziv and supports efficient direct access to any substring. The BT divides the text recursively into fixed-size blocks and those appearing earlier are represented with pointers. On repetitive collections, a few blocks can represent all the others, and thus the BT reduces the size by orders of magnitude. In this paper we extend the BT to two dimensions, to exploit repetitiveness in collections of images, graphs, and maps. This two-dimensional Block Tree divides the image regularly into subimages and replaces some of them by pointers to other occurrences thereof. We develop a specific variant aimed at compressing the adjacency matrices of Web graphs, obtaining space reductions of up to 50% compared with the k2-tree, which is the best alternative supporting direct and reverse navigation in the graph.

[1]  R.S. Bird,et al.  Two Dimensional Pattern Matching , 1977, Inf. Process. Lett..

[2]  Renato Pajarola,et al.  Spatial indexing into compressed raster images: how to answer range queries without decompression , 1996, Proceedings of International Workshop on Multimedia Database Management Systems.

[3]  Theodore P. Baker A Technique for Extending Rapid Exact-Match String Matching to Arrays of More Than One Dimension , 1978, SIAM J. Comput..

[4]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[5]  Szymon Grabowski,et al.  Merging Adjacency Lists for Efficient Web Graph Compression , 2011, ICMMI.

[6]  Philip Bille,et al.  Compressed Data Structures for Range Searching , 2015, LATA.

[7]  Yasuo Tabei,et al.  Queries on LZ-Bounded Encodings , 2014, 2015 Data Compression Conference.

[8]  Pasi Fränti,et al.  Lossless compression of large binary images in digital spatial libraries , 2000, Comput. Graph..

[9]  Abraham Lempel,et al.  Compression of two-dimensional data , 1986, IEEE Trans. Inf. Theory.

[10]  Alistair Moffat,et al.  From Theory to Practice: Plug and Play with Succinct Data Structures , 2013, SEA.

[11]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[12]  Gonzalo Navarro,et al.  Compact representation of Web graphs with extended functionality , 2014, Inf. Syst..

[13]  David Richard Clark,et al.  Compact pat trees , 1998 .

[14]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[15]  Gonzalo Navarro,et al.  Compressed representations for web and social graphs , 2013, Knowledge and Information Systems.