Efficient Historical Query in HBase for Spatio-Temporal Decision Support

Comparing to last decade, technologies to gather spatio-temporal data are more and more developed and easy to use or deploy, thus tens of billions, even trillions of sensed data are accumulated, which poses a challenge to spatio-temporal Decision Support System (stDSS). Traditional database hardly supports such huge volume, and tends to bring performance bottleneck to the analysis platform. Hence in this paper, we argue to use NoSQL database, HBase, to replace traditional back-end storage system. Under such context, the well-studied spatio-temporal querying techniques in traditional database should be shifted to HBase system parallel. However, this problem is not solved well in HBase, as many previous works tackle the problem only by designing schema, i.e., designing row key and column key formation for HBase, which we don’t believe is an effective solution. In this paper, we address this problem from nature level of HBase, and propose an index structure as a built-in component for HBase. STEHIX (Spatio-TEmporal Hbase IndeX) is adapted to two-level architecture of HBase and suitable for HBase to process spatio-temporal queries. It is composed of index in the meta table (the first level) and region index (the second level) for indexing inner structure of HBase regions. Base on this structure, three queries, range query, kNN query and GNN query are solved by proposing algorithms, respectively. For achieving load balancing and scalable kNN query, two optimizations are also presented. We implement STEHIX and conduct experiments on real dataset, and the results show our design outperforms a previous work in many aspects.

[1]  M. Geetha,et al.  Histogram-Based Global Load Balancing in Structured Peer-to-Peer Systems , 2011 .

[2]  Wang-Chien Lee,et al.  Key Formulation Schemes for Spatial Index in Cloud Data Managements , 2012, 2012 IEEE 13th International Conference on Mobile Data Management.

[3]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[4]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[5]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[6]  Eleni Stroulia,et al.  HGrid: A Data Model for Large Geospatial Data Sets in HBase , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[7]  Lars George,et al.  HBase: The Definitive Guide , 2011 .

[8]  Dirk Cattrysse,et al.  Upgrading Geographic Information Systems to Spatio-Temporal Decision Support Systems , 2011, Math. Comput. For. Nat. Resour. Sci..

[9]  Beng Chin Ooi,et al.  Histogram-Based Global Load Balancing in Structured Peer-to-Peer Systems , 2009, IEEE Transactions on Knowledge and Data Engineering.

[10]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[11]  Beng Chin Ooi,et al.  Indexing multi-dimensional data in a cloud system , 2010, SIGMOD Conference.

[12]  Shan Wang,et al.  Efficient Distributed Multi-dimensional Index for Big Data Management , 2013, WAIM.

[13]  Divyakant Agrawal,et al.  MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[14]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[15]  Huajun Chen,et al.  HBaseSpatial: A Scalable Spatial Data Storage Based on HBase , 2014, 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications.

[16]  Ming Zhao,et al.  Spatio-Temporal Data Index Model of Moving Objects on Fixed Networks Using HBase , 2015, 2015 IEEE International Conference on Computational Intelligence & Communication Technology.