Storing and Querying Semi-structured Spatio-Temporal Data in HBase

With the development of remote sensing, positioning and other technology, a large amount of spatio-temporal data require effective management. In the current research status, a lot of works have focused on how to effectively use HBase to store and quickly find structured spatio-temporal data. However, some spatio-temporal data exists in the semi-structured documents, such as metadata that describes the remote sensing products, under such context, the query is changed to spatio-temporal query + semi-structured query (XPath), which is less studies in previous works. In this paper, we focus on how to efficiently and economically achieve semi-structured spatio-temporal data storage and query in HBase. Firstly, the formal description of the problem is presented. Secondly, we propose HSSST storage model using a semi-structured approach TwigStack. On this basis, semi-structured spatio-temporal range query and kNN queries are carried out. Experiments are conducted on real dataset, comparing with MongoDB which need higher hardware configuration, the results show that in moderate configuration of machines, the performance of semi-structured spatio-temporal query algorithms are superior to MongoDB, thus it has advantage in real application.

[1]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[2]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[3]  Eleni Stroulia,et al.  HGrid: A Data Model for Large Geospatial Data Sets in HBase , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[4]  Wang-Chien Lee,et al.  Key Formulation Schemes for Spatial Index in Cloud Data Managements , 2012, 2012 IEEE 13th International Conference on Mobile Data Management.

[5]  Christoph Lange,et al.  Mapping Large Scale Research Metadata to Linked Data: A Performance Comparison of HBase, CSV and XML , 2015, MTSR.

[6]  Huajun Chen,et al.  HBaseSpatial: A Scalable Spatial Data Storage Based on HBase , 2014, 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications.

[7]  Divyakant Agrawal,et al.  MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[8]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .