Scalable and queryable compressed storage structure for raster data

Abstract Compact data structures are storage structures that combine a compressed representation of the data and the access mechanisms for retrieving individual data without the need of decompressing from the beginning. The target is to be able to keep the data always compressed, even in main memory, given that the data can be processed directly in that form. With this approach, we obtain several benefits: we can load larger datasets in main memory, we can make a better usage of the memory hierarchy, and we can obtain bandwidth savings in a distributed computational scenario, without wasting time in compressing and decompressing data during data exchanges. In this work, we follow a compact data structure approach to design a storage structure for raster data, which is commonly used to represent attributes of the space (temperatures, pressure, elevation measures, etc.) in geographical information systems. As it is common in compact data structures, our new technique is not only able to store and directly access compressed data, but also indexes its content, thereby accelerating the execution of queries. Previous compact data structures designed to store raster data work well when the raster dataset has few different values. Nevertheless, when the number of different values in the raster increases, their space consumption and search performance degrade. Our experiments show that our storage structure improves previous approaches in all aspects, especially when the number of different values is large, which is critical when applying over real datasets. Compared with classical methods for storing rasters, namely netCDF, our method competes in space and excels in access and query times.

[1]  Guido Moerkotte,et al.  Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing , 1998, VLDB.

[2]  Rajeev Raman,et al.  Succinct Representations of Ordinal Trees , 2013, Space-Efficient Data Structures, Streams, and Algorithms.

[3]  Bernhard Seeger,et al.  ChronicleDB: A High-Performance Event Store , 2017, EDBT.

[4]  M. Oguzhan Külekci Enhanced Variable-Length Codes: Improved Compression with Efficient Random Access , 2014, 2014 Data Compression Conference.

[5]  Gonzalo Navarro,et al.  DACs: Bringing direct access to variable-length codes , 2013, Inf. Process. Manag..

[6]  Azriel Rosenfeld,et al.  A geographic information system using quadtrees , 1984, Pattern Recognit..

[7]  Irene Gargantini,et al.  An effective way to represent quadtrees , 1982, CACM.

[8]  Martin Isenburg,et al.  Fast and Efficient Compression of Floating-Point Data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[9]  Gonzalo Navarro,et al.  Reorganizing compressed text , 2008, SIGIR '08.

[10]  Nieves R. Brisaboa,et al.  Compact Querieable Representations of Raster Data , 2013, SPIRE.

[11]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Marco Quartulli,et al.  A review of EO image information mining , 2012, 1203.0747.

[13]  Jianting Zhang,et al.  High-performance quadtree constructions on large-scale geospatial rasters using GPGPU parallel primitives , 2013, Int. J. Geogr. Inf. Sci..

[14]  Robert Latham,et al.  ISOBAR Preconditioner for Effective and High-throughput Lossless Data Compression , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[15]  Bernardt Duvenhage Using an implicit min/max KD-tree for doing efficient terrain line of sight calculations , 2009, AFRIGRAPH '09.

[16]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[17]  Shmuel Tomi Klein,et al.  Random access to Fibonacci encoded files , 2016, Discret. Appl. Math..

[18]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .

[19]  Jennifer Widom,et al.  Database systems - the complete book (2. ed.) , 2009 .

[20]  Hanan Samet,et al.  Data structures for quadtree approximation and compression , 1985, CACM.

[21]  Alistair Moffat,et al.  Inverted Index Compression Using Word-Aligned Binary Codes , 2004, Information Retrieval.

[22]  Nieves R. Brisaboa,et al.  Compressed vertical partitioning for efficient RDF management , 2014, Knowledge and Information Systems.

[23]  Allen Klinger,et al.  PATTERNS AND SEARCH STATISTICS , 1971 .

[24]  Gonzalo Navarro,et al.  Compact Data Structures - A Practical Approach , 2016 .

[25]  P.G. Howard,et al.  Fast and efficient lossless image compression , 1993, [Proceedings] DCC `93: Data Compression Conference.

[26]  Helen Couclelis,et al.  People Manipulate Objects (but Cultivate Fields): Beyond the Raster-Vector Debate in GIS , 1992, Spatio-Temporal Reasoning.

[27]  Jianting Zhang,et al.  Supporting Web-Based Visual Exploration of Large-Scale Raster Geospatial Data Using Binned Min-Max Quadtree , 2010, SSDBM.

[28]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[29]  Kunihiko Sadakane,et al.  Practical Entropy-Compressed Rank/Select Dictionary , 2006, ALENEX.

[30]  S. Golomb Run-length encodings. , 1966 .

[31]  Michael F. Worboys,et al.  GIS : a computing perspective , 2004 .

[32]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[33]  Le Gruenwald,et al.  Quadtree-based lightweight data compression for large-scale geospatial rasters on multi-core CPUs , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[34]  Kuo-Liang Chung,et al.  A hybrid gray image representation using spatial- and DCT-based approach with application to moment computation , 2006, J. Vis. Commun. Image Represent..

[35]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[36]  Kunihiko Sadakane,et al.  Fully Functional Static and Dynamic Succinct Trees , 2009, TALG.

[37]  Gonzalo Navarro,et al.  Compact representation of Web graphs with extended functionality , 2014, Inf. Syst..

[38]  M. Goodchild,et al.  Geographic Information Systems and Science (second edition) , 2001 .

[39]  R. González,et al.  PRACTICAL IMPLEMENTATION OF RANK AND SELECT QUERIES , 2005 .

[40]  José R. Paramá,et al.  GraCT: A Grammar Based Compressed Representation of Trajectories , 2016, SPIRE.

[41]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[42]  Guy Joseph Jacobson,et al.  Succinct static data structures , 1988 .

[43]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[44]  John R. Woodwark,et al.  Compressed Quad Trees , 1984, Comput. J..

[45]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[46]  Solomon W. Golomb,et al.  Run-length encodings (Corresp.) , 1966, IEEE Trans. Inf. Theory.

[47]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[48]  Choonghwan Lee,et al.  NetCDF-4 Performance Report , 2008 .

[49]  Hugh E. Williams,et al.  Compressing Integers for Fast File Access , 1999, Comput. J..

[50]  Nieves R. Brisaboa,et al.  A compact representation of graph databases , 2010, MLG '10.

[51]  John L. Smith Tables , 1969, Neuromuscular Disorders.

[52]  J. L. Smith,et al.  A data structure and algorithm based on a linear key for a rectangle retrieval problem , 1983, Comput. Vis. Graph. Image Process..

[53]  Chin-Chen Chang,et al.  Block image retrieval based on a compressed linear quadtree , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[54]  Tsong Wuu Lin Compressed quadtree representations for storing similar images , 1997, Image Vis. Comput..

[55]  Anastasia Ailamaki,et al.  BF-Tree: Approximate Tree Indexing , 2014, Proc. VLDB Endow..

[56]  Neil E. Wiseman,et al.  Operations on Quadtree Encoded Images , 1983, Comput. J..

[57]  Peter Deutsch,et al.  DEFLATE Compressed Data Format Specification version 1.3 , 1996, RFC.

[58]  José R. Paramá,et al.  Compact and queryable representation of raster datasets , 2016, SSDBM.

[59]  Rajeev Raman,et al.  Representing Trees of Higher Degree , 2005, Algorithmica.

[60]  Jukka Teuhola Interpolative coding of integer sequences supporting log-time random access , 2011, Inf. Process. Manag..

[61]  Yikun Li,et al.  Semantic-Sensitive Satellite Image Retrieval , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[62]  Nieves R. Brisaboa,et al.  The SMO-index: a succinct moving object structure for timestamp and interval queries , 2012, SIGSPATIAL/GIS.

[63]  Martin Burtscher,et al.  FPC: A High-Speed Compressor for Double-Precision Floating-Point Data , 2009, IEEE Transactions on Computers.

[64]  Le Gruenwald,et al.  Indexing large-scale raster geospatial data using massively parallel GPGPU computing , 2010, GIS '10.

[65]  Charles R. Dyer,et al.  Experiments on Picture Representation Using Regular Decomposition , 1976 .

[66]  David Salomon,et al.  Data Compression: The Complete Reference , 2006 .

[67]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[68]  Russ Rew,et al.  NetCDF: an interface for scientific data access , 1990, IEEE Computer Graphics and Applications.