Tree Structure Compression with RePair

Larsson and Moffat's RePair algorithm is generalized from strings to trees. The new algorithm (TreeRePair) produces straight-line linear context-free tree (SLT) grammars which are smaller than those produced by previous grammar-based compressors such as BPLEX. Experiments show that a Huffman-based coding of the resulting grammars gives compression ratios comparable to the best known XML file compressors. Moreover, SLT grammars can be used as efficient memory representation of trees. Our investigations show that tree traversals over TreeRePair grammars are 14 times slower than over pointer structures and 5 times slower than over succinct trees, while memory consumption is only 1/43 and 1/6, respectively.

[1]  Gonzalo Navarro,et al.  Fully-functional succinct trees , 2010, SODA '10.

[2]  Sebastian Maneth,et al.  The complexity of tree automata and XPath on grammar-compressed trees , 2006, Theor. Comput. Sci..

[3]  Christoph Koch,et al.  Query evaluation on compressed trees , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[4]  Ricardo S Silva Source , 2000, BMJ : British Medical Journal.

[5]  Sebastian Maneth,et al.  Efficient memory representation of XML document trees , 2008, Inf. Syst..

[6]  Gonzalo Navarro,et al.  Succinct Trees in Practice , 2010, ALENEX.

[7]  Peter Buneman,et al.  Edinburgh Research Explorer Path Queries on Compressed XML , 2022 .

[8]  Murali Mani,et al.  Taxonomy of XML schema languages using formal language theory , 2005, TOIT.

[9]  Wojciech Plandowski,et al.  Testing Equivalence of Morphisms on Context-Free Languages , 1994, ESA.

[10]  Abhi Shelat,et al.  The smallest grammar problem , 2005, IEEE Transactions on Information Theory.

[11]  Sebastian Maneth,et al.  Parameter Reduction in Grammar-Compressed Trees , 2009, FoSSaCS.

[12]  James Cheney Compressing XML with multiplexed hierarchical PPM models , 2001, Proceedings DCC 2001. Data Compression Conference.

[13]  Stefan Böttcher,et al.  CluX - Clustering XML Sub-trees , 2010, ICEIS.

[14]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[15]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[16]  Wojciech Rytter,et al.  Grammar Compression, LZ-Encodings, and String Algorithms with Implicit Input , 2004, ICALP.

[17]  Dan Suciu,et al.  Typechecking for XML transformers , 2000, PODS '00.

[18]  Craig G. Nevill-Manning,et al.  Compression by induction of hierarchical grammars , 1994, Proceedings of IEEE Data Compression Conference (DCC'94).

[19]  Peter Deutsch,et al.  DEFLATE Compressed Data Format Specification version 1.3 , 1996, RFC.

[20]  Jing Li,et al.  A space efficient XML DOM parser , 2007, Data Knowl. Eng..

[21]  Alistair Moffat,et al.  Off-line dictionary-based compression , 2000 .

[22]  A. Apostolico,et al.  Off-line compression by greedy textual substitution , 2000, Proceedings of the IEEE.

[23]  Dan Suciu,et al.  XMill: an efficient compressor for XML data , 2000, SIGMOD '00.

[24]  A. Moffat,et al.  Offline dictionary-based compression , 2000, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[25]  Sherif Sakr,et al.  XML Tree Structure Compression , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[26]  Frank Neven,et al.  Automata theory for XML researchers , 2002, SGMD.

[27]  Yury Lifshits,et al.  Processing Compressed Texts: A Tractability Border , 2007, CPM.

[28]  Gonzalo Navarro,et al.  Using structural contexts to compress semistructured text collections , 2007, Inf. Process. Manag..

[29]  Gonzalo Navarro,et al.  Self-Indexed Grammar-Based Compression , 2011, Fundam. Informaticae.