SJSON: A succinct representation for JSON documents

Abstract The massive amounts of data processed in modern computational systems are becoming a problem of increasing importance. This data is commonly stored directly or indirectly through the use of data exchange languages, such as JSON (JavaScript Object Notation) and XML (eXtensible Markup Language), for human-readable platform-agnostic access. This paper focuses on exploring a set of succinct representations for JSON documents, which we call SJSON, achieving both reduced RAM and disk usage while supporting efficient queries on the documents. The representations we propose are mainly based on the idea that JSON documents can be decomposed into structural part and raw data part. In our method, we emulate the structure of the JSON document as a rooted ordered tree and represent it using succinct data structures, as opposed to the usual pointer-based implementation. Furthermore, the remaining raw data is reorganized into arrays of attributes and values. This deconstruction between structure and data allows for a straightforward connection between a node in the succinct tree and its corresponding name-value pair, dispensing pointers altogether. The proposed scheme is implemented as the SJSON library in C++, and evaluated with respect to a number of metrics, comparing its performance with popular alternative JSON parsers. Empirical results show that the library is able to represent JSON files succinctly while efficiently supporting traversal queries.

[1]  Kunihiko Sadakane,et al.  Fully Functional Static and Dynamic Succinct Trees , 2009, TALG.

[2]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .

[3]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[4]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[5]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[6]  Guy Joseph Jacobson,et al.  Succinct static data structures , 1988 .

[7]  Fabrizio Luccio,et al.  Compressing and searching XML data via two zips , 2006, WWW '06.

[8]  E. Shekita,et al.  Jaql , 2011, Proceedings of the VLDB Endowment.

[9]  S. Srinivasa Rao,et al.  SJSON: A succinct representation for JavaScript object notation documents , 2016, 2016 Eleventh International Conference on Digital Information Management (ICDIM).

[10]  Volker Heun,et al.  A New Succinct Representation of RMQ-Information and Improvements in the Enhanced Suffix Array , 2007, ESCAPE.

[11]  Gregory Leighton,et al.  TREECHOP: A Tree-based Query-able Compressor for XML , 2005 .

[12]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[13]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[14]  Sebastiano Vigna,et al.  Broadword Implementation of Rank/Select Queries , 2008, WEA.

[15]  Kunihiko Sadakane,et al.  Ultra-succinct representation of ordered trees with applications , 2012, J. Comput. Syst. Sci..

[16]  Dan Suciu,et al.  XMill: an efficient compressor for XML data , 2000, SIGMOD '00.

[17]  Kunihiko Sadakane,et al.  Practical Entropy-Compressed Rank/Select Dictionary , 2006, ALENEX.

[18]  Gonzalo Navarro,et al.  The wavelet matrix: An efficient wavelet tree for large alphabets , 2015, Inf. Syst..

[19]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[20]  Gonzalo Navarro,et al.  Succinct Trees in Practice , 2010, ALENEX.

[21]  Tim Bray,et al.  Internet Engineering Task Force (ietf) the Javascript Object Notation (json) Data Interchange Format , 2022 .

[22]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[23]  J. Ian Munro,et al.  Succinct Representation of Balanced Parentheses and Static Trees , 2002, SIAM J. Comput..

[24]  Alon Y. Halevy,et al.  Principles of Data Integration , 2012 .

[25]  Giovanni Manzini,et al.  Indexing compressed text , 2005, JACM.

[26]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[27]  John L. Smith Tables , 1969, Neuromuscular Disorders.

[28]  Gonzalo Navarro,et al.  Compressed representations of sequences and full-text indexes , 2007, TALG.

[29]  Gonzalo Navarro,et al.  Succinct Suffix Arrays based on Run-Length Encoding , 2005, Nord. J. Comput..

[30]  Sree Narayana,et al.  SPACE EFFICIENT STRUCTURES FOR JSON DOCUMENTS , 2014 .

[31]  Andreas Krause,et al.  Learning programs from noisy data , 2016, POPL.

[32]  D. Florescu,et al.  JSONiq: The History of a Query Language , 2013, IEEE Internet Computing.

[33]  Giuseppe Ottaviano,et al.  Semi-indexing semi-structured data in tiny space , 2011, CIKM '11.

[34]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[35]  Rajeev Raman,et al.  Representing Trees of Higher Degree , 2005, Algorithmica.

[36]  Ioana Manolescu,et al.  Xquec: Pushing Queries to Compressed XML Data , 2003, VLDB.

[37]  Alistair Moffat,et al.  From Theory to Practice: Plug and Play with Succinct Data Structures , 2013, SEA.

[38]  J. Chris Anderson,et al.  CouchDB - The Definitive Guide: Time to Relax , 2010 .